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ABSTRACT:  In  informal  exposition,  the  correctness  of  a 
complex  algorithm  is  often  demonstrated  by  deriving  it  through 
successive  refinement  steps  from  a  high  level  specification,  and 
supplying  proofs  of  the  underlying  principles  used  in  the  process. 
However,  most  existing  mechanical  program  verifiers  ignore  this 
standard  expository  practice,  and  are  generally  designed  to  verify 
programs  written  in  a  low  level  form.  While  logically  simple 
algorithms  can  be  handled  adequately  in  this  manner,  attempting  to 
verify  more  complex  algorithms  at  a  low  level  requires  treatment 
of  implementation  details  which  obscure  the  main  arguments  of  the 
verification. 

This  thesis  describes  a  systematic  technique  for  proving 
algorithms  correct  using  a  transformational  approach,  and  presents 
a  detailed  transformation/verification  scenario  of  the  proof  of  a 
variety  of  complex  combinatorial  algorithms.  The  algorithms 
treated  here  are  considerably  more  involved  than  those  verified  by 
other  methods. 

The  programming  language  used  is  a  variant  of  SETL,  adapted 
for  program  verification,  which  provides  a  medium  for  high  level 
specification.  A  program  P  is  annotated  with  logical  formulae  of 
set   theory,   which   are   called  assumptions  and  assertions.   P  is 


said  to  be  partially  correct  if  every  cotnputation  which  satisfies 
all  assumptions  also  satisfies  all  assertions.  In  order  to  prove 
the  correctness  of  P,  which  initially  contains  only  assumptions, 
we  apply  proof  rules  which  are  used  both  to  transform  the  program 
into  logical  formulae  called  verification  conditions  and  then  to 
prove  these  verification  conditions. 

The  transformation  rules  are  unique  in  that  they  enable  the 
combination  of  correct  program  fragments.  We  are  able  to  reuse 
general  code  fragments  in  a  variety  of  contexts  without  reproof 
and  to  derive  several  different  low  level  algorithms  from  a  single 
high  level  algorithm.  The  transformations  often  require  proof  of 
enabling  conditions.  In  such  cases,  when  a  transformation  is 
performed,  the  enabling  condition  is  introduced  into  the  program 
text  as  an  assumption  which  must  be  verified  in  turn  using  the 
proof  mechanism  described  above. 
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CHAPTER  1 
Introduction 


In  informal  exposition,  the  correctness  of  a  complex 
algorithm  is  often  demonstrated  by  deriving  it  through 
successive  refinement  steps  from  a  high  level  specification,  and 
supplying  proofs  of  the  underlying  principles  used  in  the 
process.  However,  most  existing  mechanical  program  verifiers 
ignore  this  standard  expository  practice,  and  are  generally 
designed  to  verify  programs  written  in  a  low  level  form.  While 
logically  simple  algorithms  can  be  handled  adequately  in  this 
manner,  we  believe  that  attempting  to  verify  more  complex 
algorithnis  at  a  low  level  will  be  unproductive.  In  particular, 
proof  of  a  low  level  algorithm  often  requires  treatment  of 
implementation  details  which  obscure  the  main  arguments  of  the 
veri  fication. 

In  this  thesis  we  will  describe  a  systematic  technique  for 
proving  algorithms  correct  using  a  transformational  approach. 
We  demonstrate  its  feasibility  by  presenting  a  detailed 
transformation/verification  scenario  of  the  proof  of  a  variety 
of  complex  combinatorial  algorithms. 
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1 . 1  Background 

Several  approaches  for  the  development  of  reliable  programs 
are  used  in  existing  systems  for  formally  verifying  programs. 
These  include  the  Stanford  PASCAL  Verifier  [GKL79]  and  the  PL/CV 
system  [COD78]  for  verifying  PL/C  programs.  Verification  of  a 
program  with  either  of  these  systems  proceeds  as  follows.  The 
user  writes  a  PASCAL  or  PL/C  program  and  annotates  it  with  an 
input  assumption,  which  states  restrictions  on  valid  input,  and 
an  output  assertion,  which  describes  properties  of  variables  at 
execution  termination.  The  assertions  are  logical  formulae,  the 
free  variables  of  which  are  program  variables.  For  all  program 
loops,  the  user  must  also  supply  loop  invariant  assertions, 
which  serve  as  inductive  hypotheses.  The  verifier  system  either 
accepts  the  annotated  program  as  correct  with  respect  to  the 
assertions,  or  rejects  it,  giving  appropriate  diagnostics, 
whereupon  the  user  must  repair  either  the  faulty  program  or  the 
faulty  assertions. 

The  two  systems  we  have  mentioned  differ  significantly  in 
their  proof  technique.  The  PASCAL  verifier  propagates 
assertions  through  the  program  and  generates  a  set  of  formulae, 
called  verification  conditions,  which  must  then  be  verified. 
The  actual  program  is  thereby  eliminated  from  consideration 
after  the  first  phase,  and  the  second  phase  involves  only 
theorem  proving.  The  PASCAL  verifier  assertion  language  is 
restricted   in  ways  which  allow  fast  decision  procedures  to  be 
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used  to  simplify  the  verification  conditions  [N078] .  In 
particular,  the  verification  conditions  are  quantifier  free  and 
contain  predicate  symbols  which  are  uninterpreted.  Rewrite 
rules  are  supplied  by  the  user  to  direct  the  simplification  of 
formulae  containing  these  uninterpreted  predicate  symbols. 

In  contrast,  in  the  PL/CV  system,  first  order  predicate 
logic  is  extended  to  include  programming  language  constructs. 
Thus  PL/CV  works  with  a  "programming  logic'.  This  makes  it 
unnecessary  to  extract  logical  formulae  from  the  program  text; 
instead  the  entire  annotated  program  is  treated  formally  as  a 
proof  argument.  Proofs  are  guided  by  directives  specifying 
which  proof  rules  to  apply;  these  directives  are  included  in 
the  text  of  the  program  to  be  proved  correct. 

While  many  simple  programs  (and  some  larger  ones)  have  been 
verified  with  these  systems,  we  feel  these  systems  take  an 
inappropriately  low-level  and  static  view  of  algorithms.  This 
has  many  disadvantages,  in  particular  the  imposition  of  tedious 
repetition  on  the  system  user.  Many  common  program  structures, 
all  related  to  some  single  more  abstract  version  of  the 
algorithm,  must  be  handled  manually  by  the  user  of  tViese 
systems.  To  avoid  this  repetitive  consideration  of  common 
abstract  themes,  one  needs  to  make  use  of  programming  languages 
of  higher  level  than  either  PASCAL  or  PL/I.  Moreover, 
formulating  the  inductive  assertions  required  to  verify 
algorithms  in  a  low  level  form  often  involves  reconstructing  the 
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high  level  ''root'  algorithm  from  which  these  low-level  forms 
descend.  This  process  is  comparable  to  decompilation,  and 
therefore  is  harder  than  transformational  derivation,  which  is 
comparable  to  compilation. 

Work  on  higher  level  languages  has  not  evolved  directly 
from  formal  program  verification  efforts.  Nevertheless,  this 
work  does  contribute  to  verification  technology.  Indeed,  a  high 
level  language  allows  algorithms  to  be  expressed  clearly  and 
succinctly  in  an  abstract  form.  Specific  choices  of  data 
representation  and  other  implementation  related  details  need  not 
be  explicitly  stated.  An  abstract  algorithm  specification 
generally  lies  closer  to  the  mathematical  facts  needed  to 
justify  it  than  a  more  detail-rich  low  level  form  would. 
Moreover,  once  verified,  a  high  level  algorithm  can  be  compiled 
or  transformed  into  an  efficient  low  level  version.  The 
correctness  of  its  compiled  form  will  then  follow  from  the 
correctness  of  its  abstract  form  and  of  the  compiler  (or 
semi-automatic  transformational  steps)  used  to  obtain  this  final 
form. 

However,  the  transformation  of  the  first  root  form  of  an 
algorithm  into  an  efficient  low  level  version  will  generally 
involve  several  layers  of  transformation,  of  which  only  the  last 
is  stereotyped  enough  to  be  handled  in  completely  automatic 
fashion  by  a  compiler.  Recent  work  on  transformational 
programming   has   begun   to   clarify   the   structure  of  the  less 
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stereotyped,  higher  level  transformations  which  are  typically 
applied  first  to  generate  constructs  commonly  occurring  in 
programs  of  the  PL/I  or  PASCAL  level.  Computerized  systems 
allowing  the  most  important  of  these  transformations  (including 
recursion  removal  [BD77]  and  abstract  set  theoretic  strength 
reduction  [Pa79] )  to  be  performed  semi-automat ically  have  been 
proposed. 

However,  in  spite  of  the  possibility  of  elaborating 
transformational  tools  of  this  sort,  it  is  unlikely  that  they 
will  be  able  to  deal  with  efficient  algorithms  which  depend  on 
ingenious  tricks  and  complex  logical  arguments.  For  this,  more 
general  program  transformations  and  more  detailed  user  guidance 
will  be  required.  It  is  this  family  of  more  general  techniques 
upon  which  the  present  thesis  concentrates. 

Fundamental  to  the  rationale  of  our  approach  is  the 
observation  that  there  exist  elementary  algorithmic  techniques 
of  wide  applicability,  and  that  algorithms  can  be  seen  as 
combinations  of  such  components.  Once  identified  and  proven, 
such  algorithm  components  can  be  reused  in  a  variety  of  contexts 
without  reproof.  Thus,  an  important  feature  of  our  system  will 
be  the  ability  to  combine  proven  program  elements  into  new 
elements  which  can  be  incorporated  into  a  growing  library  of 
proven  algorithms. 

The  current  work  is  an  outgrowth  of  the  ideas  presented   in 
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a  paper  by  J.T.  Schwartz  [DS77] ,  who  advocates  the  use  of 
source-to-source  transformational  techniques  as  a  tool  for 
program  verification,  the  use  of  a  high  level  set  theoretic 
programming  language,  and  the  use  of  set  theory  as  the 
verification  logic.  He  describes  a  variant  of  the  inductive 
assertion  formalism,  which  is  suited  to  proving  algorithm 
derivations  by  transformation,  and  he  illustrates  various 
correctness  preserving  transformations  and  algorithm  derivation 
techniques.  This  thesis  will  refine  and  extend  the  original 
proposals  of  Schwartz.  Note  that  similar  ideas  have  begun  to  be 
actively  elaborated  in  the  work  of  Bauer  [Ba76] ,  [Ba78] ,  [Ba79] 
and  his  collaborators.  Standish  et  al  [SHKN76]  have  collected  a 
catalogue  of  source-to-source  transformations  for  an  ALGOL-like 
language.  Before  Schwartz,  Gerhart  [Ger75] ,  [Ger75a]  also 
suggested  the  strategy  of  collecting  general  program  scheraas, 
and  using  source-to-source  transformations  to  derive  specific 
algorithms  from  them.  Systems  which  perform  source-to-source 
transformations  include  those  of  Arsac  [Ar79] ,  Balzer  et  al 
[BGW75],  Klbler  and  Standish  [Kib79] ,  and  Lovenan  [Lov77] . 

In  the  first  part  of  this  thesis,  we  specify  our  language 
and  rules.  Our  programming  language,  described  in  Chapter  2,  is 
a  SETL-like  language  [Dew79] ,  adapted  for  program  verification. 
The  semantics  of  the  language  are  then  specified  by  (i)  defining 
certain  dictions  in  terms  of  simpler  more  basic  dictions,  using 
compilation  transformations  (in  section  2.2),  and  (ii)  stating 
proof  rules  for  basic  dictions  (in  section  3.2).    Additionally, 
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I 

Chapter   3   discusses  our  program  proof  technique,  including  the 

I 

issues  of  program  termination  and  proof  checking.   In  Chapter  4, 

we   present    a   set    of  correctness   preserving   program 

transformation   rules,   whic-h  facilitate   the  combination   of 
program  components. 

The  second  part  of  this  thesis  consists  of  examples, 
demonstrating  the  use  of  our  rules.  In  section  1.5,  we  present 
a  quite  simple  motivating  example  involving  search  algorithms. 
However,  the  test  of  an  effective  verification  system,  as  well 
as  the  difficulty  of  full  formal  verification,  are  found  in  its 
handling  of  non-trivial  algorithms.  For  this  reason,  we  have 
devoted  Chapter  5  to  a  detailed  derivation  and  proof  of  a 
complex  algorithm,  Tarjan's  flowgraph  reducibility-test 
algorithm.  (Several  other  examples  of  intermediate  difficulty 
are  described  in  section  4.2  to  illustrate  our  transformation 
rules.)  In  section  5.1  we  give  a  detailed  derivation  from  a  high 
level  specification,  stating  lemmas  (without  proof)  to  justify 
the  transformation  steps.  In  section  5.2,  we  give  a  formal 
proof  of  these  lemmas.  We  then  estimate  the  number  of  proof 
steps  that  would  be  required  to  prove  the  algorithm  using  a 
proof  checker  to  be  roughly  2,000.  This  is  at  the  edge  of 
state-of-the-art  proof  technology,  an-d  shows  that  no 
verification  system,  based  on  a  low  level  language  and  logic  is 
not  likely  to  be  viable.  Finally,  in  section  5.3,  we  discuss  a 
proof  of  the  same  algorithm  which  does  not  use  transformations, 
and  argue  that  in  their  absence,  proof  of  an  algorithm  of  such 
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complexity   is   considerably  harder,   and   in  fact  requires  the 
retracing  of  the  transformational  proof  steps. 

1.2  Verification  Formalism 

Having  given  this  background,  let  us  now  describe  our 
proposed  transformation/verification  framework  for  the 
derivation  of  correct  programs.  The  elements  which  the  system 
manipulates  are  program  texts  annotated  with  logical 
propositions.  A  proposition  P  can  be  associated  with  any  place 
@  in  the  program.  The  language  in  which  these  propositions  are 
to  be  written  will  be  set  theory;  the  programming  language  will 
include  set  theoretic  expressions  and  standard  control 
structures.  (Details  concerning  the  programming  language  are 
given  in  Chapter  2.)  The  free  variables  of  the  propositions  with 
which  a  program  is  annotated  are  required  to  be  program 
variables,  so  that  each  program  proposition  describes  properties 
of  the  program  variables  at  a  particular  program  point  during 
its  execution.  Program  variables  include  the  variables  used 
explicitly  in  the  program  plus  an  indefinite  collection  of 
additional  variables.  (For  additional  details  concerning  this 
formalism  see  Schwartz  [DS77].) 

Each  program  proposition   P   is  classified   either  as   an 

assumption,    indicated  by  writing  |=  P,   or   an  assertion, 

indicated  by  writing  |-  P.   A  program  annotated  with  assumptions 
and  assertions  will  be  called  a  praa. 
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We  define  the  semantics  and  (partial)  correctness  of  a 
program  in  terms  of  the  computation  states  which  are  taken  on 
during  execution. 

Specifically,  let  Q  be  a  program. 

Def-  A  state  S  of  Q  is  a  pair  [H,  @] ,  where  M  is  a  napping 
which  assigns  values  to  all  the  variables  of  Q,  and  where  @  is  a 
place  in  the  program  (technically,  a  place  which  immediately 
precedes  sorae  statement  in  the  program.)  The  values  which 
variables  can  have  are  required  to  be  elements  of  the  standard 
model  for  (Zemelo-Frankel)  set  theory. 

Def-  A  computation  C  for  Q  is  a  finite  (or  infinite) 
sequence  of  states  SI,  82,...,  where  each  state  Si+1  is  derived 
from  Si  in  the  manner  defined  by  the  semantics  of  the  statement 
immediately  follov;ing  Si(2),  and  where  SI  (2)  is  the  initial  or 
entry  place  of  the  program. 

Def.  If  S  is  a  state,  where  S(2)  =  @,  and  P  is  a 
proposition  associated  with  Q,  with  free  variables  XI,...,  Xn, 
then  S  satisfies  P  at  0  if  P(S(1)(X1),  S (1 ) (X2) , . . . ,  S(l)(Xn)) 
is  true. 

Def.  A  computation  C  satisfies  a  proposition  P  at  a  place 
(3  if  for  every  state  S  in  C  such  that  S(2)  =  @,  S  satisfies  P. 
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Def.  A  praa  R  is  said  to  be  partially  correct  if  every 
computation  C  of  R  which  satisfies  all  assumptions  of  R  also 
satisfies  all  of  the  assertions  of  R. 

We  vjill  concern  ourselves  with  partial  correctness.  Thus 
even  a  praa  which  does  not  terminate  can  be  partially  correct  in 
the  sense  of  the  preceding  definition. 

Note  that  our  correctness  definition  differs  technically 
from  the  usual  Floyd/Hoare  definition  in  that  assumptions  and 
assertions  may  appear  at  various  places  in  the  program.  In  the 
Floyd/Hoare  approach,  the  assumptions  usually  appear  only  at  the 
start  of  execution,  and  assertions  only  at  the  end.  We  drop 
this  restriction  to  gain  flexibility  and  to  allow  a  uniform 
treatment  of  program  verification  and  program  transformation 
steps. 

1.3  Set  Theory  Concepts 

In  this  section,  we  give  a  brief  description  of  the 
primitives  of  set  theory,  upon  which  our  verification  formalism 
is  based.   We  enumerate  these  prim.ltlves  as  follows: 

(a)  The  operators  &,  or,  ->,  <->  of  propositlonal  calculus, 
and  the  quantifiers  A  and  E  of  predicate  calculus,  together  with 
all  the  standard  rules  for  manipulating  them,  are  available.  We 
also  allow  the  conditional  form: 
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if  CI  then  tl  elseif  C2  then  t2...else  tn  end 

(b)  The  equality  relation  x  =  y  and  the  membership  relation 
X  in  y  are  available.  Equality  has  all  its  standard  properties, 
and  in  addition  we  have  the 

Axiom:   u  =  v  <->   A  x  |  x  in  u  <->  x  in  v 

(c)  We  are  allowed  to  write  set  formers  {e  :  xl,...,xn  | 
C},  where  e  is  an  arbitrary  expression,  C  any  predicate,  and 
xl,...,  xn  any  list  of  variable.  This  syntactic  construct 
designates  the  set  of  all  values  that  e  can  take  on  as  xl,...,xn 
vary  over  all  values  satisfying  the  condition  C.  We  have 
therefore  the 

Axiom:  z  in  {e  :  xl,...,xn  |  C}  <->  E  xl,...,xn  |  z  =  e  &  C 
The  expression  e  and  C  are  essentially  arbitrary,  except  that 
certain  teclinical  restrictions,  which  prevent  formation  of 
"paradoxical'  sets  (e.g. the  set  of  all  sets  which  are  not 
members  of  themselves),  must  be  respected.  It  is  convenient  to 
use  {x  :   C}  to  abbreviate  {x:x  |  C}. 

(d)  I f  f  is  any  function  (resp.  predicate)  symbol  which 
has  never  been  used  before,  and  e  is  any  expression  (resp. 
predicate)  whose  only  free  variables  are  xl,...,xn,  then  we  can 
introduce  the  equality 

f(xl,...,xn)  =  e   (resp.  f(xl,...,xn)  <->  e) 
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as  a  definition. 

Many   basic   set    theoretic   definitions   are   entirely 
straightforward: 

{u}     =   {y  I  y  =  u}  Singleton 

{u,  v}   =   {y  I  y  =  u  or  y  =  v}         Pair 

{}  =    {y   I   y   ~=  y}  Nullset 

0  =  {},  1  =  {0},  2  =  {0,  1}  Zero,  one,  two 

[x,  y]   =   {{0,  {x}},  {2,  {y>}}  Ordered  pair 

u  +  V   =   {y  I  y  in  u  or  y  in  v}  Union 

u  *  V   =   {y  I  y  in  u  &  y  in  v}  Intersection 

u-v   =   {ylyinu&y  ~in  v}  Difference 

u  subset  V  <->  u  -  V  =  {}  Inclusion 

pow(u)   =   {y  I  y  subset  u}  Powerset 

f{u}    =   {y  I  [u,  y]  in  f}  Map  application 

range  f  =   {y  :  x,  y  |  [x,  y]  in  f}  Range 

dom  f   =   {x  :  X,  y  I  [x,  Y]  in  f}  Doraain 

restrict(f,  s)  = 

{[x,  y]  :  x,y  |  [x,  y]  in  f}  Map  restriction 

f[s]     =   range  restrict (f,  s)         Range  of  map  on  set 

f-inv   =   {[y,  x]  :  x,y  |  [x,  y]  in  f}  Inverse  mapping 

Un(s)    =   {x  :  x,y  |  x  in  y  &  y  in  s}  Union  set 

Under  appropriate  restrictions,  definitions  are  allowed   to 
be  recursive.   Specifically,  let  the  definition  be 

(*)  f(xl,...,xn)  =  r 
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I 

Then  there  must  exist  some  other  already  defined  function 
a(xl , . . . ,xn) ,  which  we  shall  call  the  auxiliary  function  of  the 
definition  (*),  such  that  every  occurrence  of  f  within  r  is  part 
of  a  subexpression 

if  a(el,...,en)  "in  a(xl,...,xn)  then  {}  else  f(el,...,en)  end 
Similarly,  for  recursive  predicate  definitions 

P(xl , . . . ,xn)  <->  r 
we  insist  that  every  occurrence  of  P   v;ithin   r   be   part   of   a 
predicate  subterm 

a(el,...,en)  in  a(xl,...,xn)  ->  P(el , . . . ,en) . 
It  should  be   understood   that   equivalent   forms   of  recursive 
function   defiaitions  are  allowed.   In  chapter  3  we  describe  how 
a  proof  checker  might  verify  correct  recursive  definition  forms. 

(e)  For  any  predicate  P(xl , . . .  ,xn,y)  whose  only  free 
variables  are  xl,...,xn,y,  we  can  introduce  a  new  function 
symbol  f  of  n  variables  and  a  defining  statement: 

A  xl,...,xn  I  P(xl,...,xn,f(xl,...,xn))  <-> 

E  y  I  P(xl, . . ,xn,  y) 

(f)  The  necessary  assumption  that  there  exists  at  least  one 
infinite  set  is  formulated  as  the  axiom  of  infinity: 

Axiom    :         Eu|uinUn(u)       &      u*'={} 
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(g)  A  "choice'  operator  arb  which  selects  a  (fixed)  element 
from  each  nonnull  set  is  available,  together  with  the  following 
strong  Axiom  of  Choice  (in  which  min  selects  a  kind  of  'minimum' 
element) : 


Axiom  :   rain/s  *  s  =  {}   &   arb  stnsors={}   &   arb  {}  = 
{} 
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1.4  Correctness  Preserving  Transformations 

Optimizing  compilers  perform  program  transformations  in  a 
manner  guided  by  considerations  of  profitability  and  safety. 
Profitable  transformations  include  elimination  of  useless  code, 
motion  of  invariant  code  out  of  loops,  and  replacement  of 
expensive  inner  loop  calculations  by  less  costly  operations.  A 
transformation  can  be  performed  only  when  it  is  safe,  that  is, 
when  it  can  be  determined  that  the  outcome  of  the  program  will 
not  be  altered  in  any  way.  This  same  consideration  of  safety  is 
fundamental  to  the  very  general  transformation  scheme  that  will 
be  described  in  what  follows.  Safety  will  be  guaranteed  by 
imposing  enabling  conditions  on  the  application  of 
trans  format  ions . 

Enabling  conditions  for  transformations  used  by  optimizing 
compilers  must  be  quite  restrictive  because  only  a  limited 
amount  of  information  can  be  gathered  automatically  in  a 
reasonable  amount  of  time.  Given  programs  annotated  with 
assertions,  and  a  semi-automatic  theorem  proving  capability, 
much  more  general  transformational  possibilities  appear.  In 
particular,  the  verification  capability  we  assume  allows  deeper 
facts  about  the  program  to  be  used  as  enabling  conditions  for 
transformations,  such  as  the  fact  that  two  sets  are  disjoint. 

We  say  that  a  praa  transformation  T  is  correctness 
preserving   if,   for  all  possible  correct  praas  R  to  which  T  can 
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be  applied,  thereby  producing  R',  R'  is  partially  correct.  If  T 
is  correctness  preserving,  we  will  also  say  that  T  is  a  correct 
transformation. 

Our  aim  is  to  define  a  set  of  correct  transformations  which 

could   be  used,   in  various   combinations,   to   realize  many 

important  algorithm  derivation  techniques,  including  at  least 
the  following: 

(i)  Transformations  involving  recursion,  such  as  recursion 
removal,  elimination  of  redundancy  by  tabularizatlon,  recursion 
introduction,  and  procedure  integration. 

(ii)  Reduction  in  strength  of  set  theoretic  expressions. 

(iii)  Replacement  of  non-deterministic  selection  by 
procedural  code. 

(iv)  Algorithm  combination- 


(v)  Modification  of  data  representations. 


(vi)  Compilation  of  high  level  dictions  into  expanded  text. 


(vii)  Reuse  of  storage  which  is  no  longer  needed. 


(viil)  Various   preparatory   and   cleanup   transformations. 
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such  as  code  motion,  loop  unrolling,  dead  code  elimination, 
variable  renaming,  expression  substitution,  and  application  of 
algebraic  laws  and  logical  identities. 


1.5  An  Introductory  Example  :   Searching  a  Sorted  Array 

Before  entering  into  technical  details,  we  present  a  simple 
example  to  informally  demonstrate  the  transformational  style 
which  we  envision.  Whenever  the  correctness  of  the 
transformations  we  apply  in  this  example  is  not  intuitively 
clear,  we  will  supply  logical  propositions  to  justify  them. 

We  begin  with  the  following  high  level  algorithm  for 
searc>iing  a  sorted  array;  we  will  assume  this  high-level 
version  has  been  verified.  Note  that  this  praa  is  abstract 
enough  to  serve  as  the  "root"  formulation  for  various 
algorithms,  including  binary  search,  sequential  search,  and  the 
Fibonacciaa  search  algorithm  described  by  Knuth  [Kn73] . 

(1)        proc  search(left,  right)  returns  j; 

1=  sorted(a)  &  b:int 

if    right    <    left    then    return    false;    end; 
i    :=  arb    {k:    left    <=  k    <=   right}; 

if  a(i)    =  b    then 

return  true; 
else 

if  a(i)  >  b  then 

return   search(left,    i-1); 
else 

return   search(i+l,  right); 
end; 
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end; 

I-  j  <->  (E  left  <=  X  <=right  |  a(x)  =  b) 
end; 
The  praa  (1)  resembles  the  well-known  binary   search   algorithm, 
except  for  the  statement: 

(2)        i  :=  arb  {k:  left  <=  k  <=  right}; 
which  assigns  to  i  an  arbitrary  element  from  a  specified   range. 
We   obtain  particular  search  algorithms  by  specifying  different 
ways  of  probing  the  sorted  input  array  a;   that  is,  by  replacing 

(2)  by  a  particular  procedure  for  obtaining  i.  While  the  way  in 
which  i  is  actually  selected  affects  the  efficiency  of  the 
search,  it  can  also  be  regarded  as  an  implementation  detail 
which  is  not  crucial  to  the  correctness  of  the  algorithm. 

We  begin  the  process  of  transforming  (1)  by  applying 
recursion  removal  to  obtain  an  iterative  algorithm.  This  is  a 
particularly  easy  case  of  recursion  removal  since  the  recursive 
calls  occur  at  the  end  of  the  procedure,  and  can  be  replaced  by 
branches  to  the  first  statement.  Recursion  removal  yields  the 
following  praa  variant. 

(3)  proc  searchdeftO,  rightO)  returns  j; 

1=  sorted(a)  &  b  :  int 

left  :=  leftO; 
right  :=  rightO; 
L:    if  right  <  left  then  return  false;  end; 
i  :=  arb  {k:  left  <=  k  <=  right}; 
if  a(i)  =  b  then 

return  true; 
else 

if  a(i)  >  b  then 

right  :=  i  -  1; 
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go  to  L; 
else 

left  :=  i  +  1; 

go  to  L; 
end; 
end; 

I-  j  <->  (E  leftO  <=  X  <=  rightO  |  a(x)  =  b) 

end ; 
Then,   after   some   cleanup   transformations,   we   obtain   the 
following  version. 

(4)   proc  searchdeftO,  rightO)  returns  j; 

1=  sorted(a)  &  b  :  int 

left  :=  leftO; 
right  :=  rightO; 

(while  right  >=  left) 

i  :=  arb  {k:  left  <=  k  <=  right}; 
if  a(i)  =  b  then 

return  true; 
else 

if  a(i)  >  b  then 

right  :=  i  -  1; 
else 

left  :=  i  +  1; 
end; 
end; 
return  false; 

I-  j  <->  (E  leftO  <=  X  <=  rightO  |  a(x)  =  b) 
end; 


Now  we  go  on  to  specialize  this  algorithm  in  various  ways. 
1.5.1  Binary  Search 

An  obvious  possiblility  is  to  replace  the  statement  (2)   by 
the  statement 
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i  :=  (left  +  right)  div  2; 
which   transforms   the   algorithm   (4)   into   a  binary   search 
algorithm.   To  justify  this  transformation,  the  assumption 

(left  +  right)  div  2  in  {k:  left  <=  k  <=  right} 
must  be  verified.   However   this   assumption   follows   from  the 
condition 

left  <=  right. 
After  this  transformation,   we   obtain   the   following   standard 
binary  search  praa. 

(5)    proc  search(leftO,  rightO)  returns  j; 

1=  sorted(a)  &  b  :  int 

left  :=  leftO; 
right  :=  rightO; 

(while  right  >=  left) 

i  :=  (left  +  right)  div  2; 

if  a(i)  =  b  then  return  true;  end; 

if  a(i)  >  b  then 

right  :=  i  -  1; 
else 

left  :=  i  +  1; 
end; 

end; 

return  false; 

I-  found  <->  (E  leftO  <=  x  <=  rightO  |  a(x)  =  b) 

end; 

1.5.2  Sequential  Search 

The  standard  sequential  search  algorithm  can  be   obtained 
from  the  high  level  version  by  substituting  either 
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i  :=  left; 


or 


i  :=  right; 
for  statement  (2).    After   this   substitution  has   been  made, 
cleanup   transformations  are  necessary  to  put  the  algorithm  into 
a  more   elegant   form.    The  praa  below   is   obtained    from 
substituting 

i  :=  left; 
for  (2),  and  replacing  subsequent  uses  of  i  by  uses  of  left. 

(6)  proc  searchdeftO,  rightO)  returns  j; 

1=  sorted(a)  &  b  :  int 

left  :=  leftO; 
right  :=  rightO; 

(while  right  >=  left) 

i  :=  left; 

if  a(left)  =  b  then  return  true;  end; 
(30      if  a(left)  >  b  then 
@1  right  :=  left  -  1; 

else 

left  :=  left  +  1; 

end  if; 

end; 

return  false; 

I-  found  <->  (E  leftO  <=  x  <=  rightO  |  a(x)  =  b) 
end; 

Next,  observe  that  when  the  condition  a(left)  >  b  becomes 
true  at  @0,  exit  from  the  while  loop  will  follow  immediately, 
since 
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right  <  left 
at  @1,  as  follows  from  the  assertion 

right  =  left  -  1 
Therefore,  we  can  replace  @1  by  a  statement  which  branches   out 
of  the  loop  directly.   This  transformation  makes  use  of  the  fact 
that  the  var'able  right  is  not  used  after  the   loop   exit.   The 
following  praa  results. 

(7)  proc  search(leftO,  rightO)  returns  j; 

1=  sorted(a)   &  b  :  int 

left  :=  leftO; 
@0  right  :=  rightO; 

(31  (while  right  >=  left) 

i  :=  left; 
(32         if  a(left)  =  b  then  return  true;  end; 
@3         if  a(left)  >  b  then  quit;   end; 

left  :=  left  +  1; 

end; 

return    false; 

t-    found    <->    (E   leftO   <=  x   <=  rightO    |    a(x)    =  b) 

end; 

Finally,  at  (?1,  the  use  of  right  can  be  replaced  by  a  use 
of  rightO  so  that  statement  (30  can  be  deleted  as  dead  code.  The 
branch  statements  at  (32  and  Q3   can  then  be  combined  to  give: 

(8)  proc  search(leftO,  rightO)  returns  j; 
1=  sorted(a)  &  b  :  int 

left  :=  leftO; 

(while  rightO  >=  left) 
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i  :=  left; 

if  a(left)  >=  b  then  quit;  end; 

left  :=  left  +  1; 

end; 

if  a(i)  =  b  then 

return(true) ; 
else 

return(false) ; 
end; 

I-  found  <->  (E  leftO  <=  x  <=  rightO  |  a(x)  =  b) 

end; 

1.5.3  Fibonaccian  Search 

A  third  search  algorithm  which  can  be  derived  from  the  high 
level  version  is  the  Fibonaccian  search  procedure  described  by 
Knuth  [Kn73] .  This  algorithm  computes  values  of  i  without 
performing  multiplication  or  division  operations.  We  assume 
that  if  n  is  the  length  of  a,  then  n  is  one  less  than  a  perfect 
Fibonacci  number.  Values  of  i  are  nodes  in  the  corresponding 
Fibonacci  tree  for  n,  obtained  by  adding  or  subtracting 
successively  smaller  Fibonacci  numbers.  (A  Fibonacci  tree  of 
order  m  is  defined  as  a  binary  tree  such  that  the  left  subtree 
is  a  Fibonacci  tree  of  order  m-1  and  the  right  subtree  is  a 
Fibonacci  tree  of  order  m-2).  The  algorithm  uses  two  auxiliary 
variables  q  and  p,  which  are  always  consecutive  Fibonacci 
numbers  satisfying  the  predicate  q  <  p.  I f  we  assume  i  is 
the  root  of  the  current  subtree  in  the  Fibonacci  tree  being 
searched,  which  is  of  height  m,  then  p  is  fLb(m  -  1)  and  q  is 
fib(m  -   2).   Furthermore,  left  -  1  corresponds  to  the  leftmost 
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leaf  of  the  subtree  under   i,   and   right   corresponds   to   the 
rightmost  leaf  of  the  subtree. 

We  shall  now  derive  the  Fibonaccian  search  algorithm  from 
the  high  level  version.  We  assume  that  m  is  the  height  of  the 
Fibonacci  tree  for  n.  Variables  i,  p,  and  q  are  initialized  by 
the  code: 

(9)        i  :=  fib(m); 
p  :=  fib(m-l); 
q  :=  fib(ra-2); 

To  update  i  in  the  case  where  a(i)  >  b,  i  is  moved  down  the  left 

branch   of   the   tree.    Otherwise,   i   is   moved  down  the  right 

branch.   We  move  i  down  the  left  branch  using  the  code  sequence: 


(10)       i  :=  i  -  q; 
q  :=  p  -  q; 
p  :=  p  -  q; 


Movement  down  the  right  branch  is  achieved  by  the  code; 


(11)       i  :=  i  +  q; 
p  :=  p  -  q; 
q  :=  q  -  p; 


These  three  different  code  texts  will  eventually  be 
substituted  for  the  selection  statement.  To  prepare  for 
insertion  of  the  code  sequences  (9),  (10),  and  (11),  the  high 
level  version  is  transformed  as  follows: 

(12)       proc  search(leftO,  rightO)  returns  j; 

1=  sorted(a)   &  b:int   &  m  >  2   &  leftO  =  1   & 
rightO  +  1  =  fib(m+l) 
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left  :=  leftO; 
right  :=  rightO; 

@0  if  right  <  left  then  go  to  exit;  end; 
@1  i  :=  arb  {k:  left  <=  k  <=  right}; 

(while  true) 

if  a(i)  =  b  then  return  true;  end; 

if  a(i)  >  b  then 

right  :=  i  -  1; 
@2  if  right  <  left  then  quit;  end; 

(33  i  :=  arb  {k:  left  <=  k  <=  right}; 

else 

left    :=   i   +    1; 
@4  if  right    <   left    then  quit;    end; 

05  i    :=   arb    {k:      left    <=  k   <=  right}; 

end; 

end; 
exit:   return  false; 

I-  found  <->  (E  leftO  <=  x  <=  rightO  |  a(x)  =  b) 
end; 


This  version  is  obtained  by  moving  and  duplicating 
statements  backwards  in  the  program  flow.  Note  also  that  in 
this  version  the  input  assumption  has  been  strengthened. 

Next,  since  right  <  left  is  false  at  @0,  statement  (30  can 
be  deleted.  We  now  substitute  the  code  sequences  (9),  (10),  and 
(11)  for  statements  @1,  (33,  and  (35  respectively.  The  variables 
p  and  q  are  new  variables,  and  hence  assignment  statements 
modifying  them  can  be  introduced  freely  into  the  praa.  However, 
it  must  be  shown  that  i  is  calculated  correctly  by  these  code 
fragments.   For  this  purpose,  we  note  that  the  assertion: 

(13)        (Ek  I  p  =  fib(k-l)  &  q  =  fib(k-2)   & 


Introduction  PAGE  1-26 

right  =  i  +  p  -  1   &   left  =  i-q-p+l) 
is  not  spoiled  by  the  code  (10)  provided  that  q  >  0  before   (9), 
and   that  (13)  is  also  preserved  by  the  code  (11),  provided  that 
p   >   1;    this   claim  can  be  proven   from  simple   algebraic 
manipulations.   From  the  assertion  (13)  we  can  show  that 

i  in  {k:  left  <=  k  <=  right} 
is  true  after  (10)  and  (11).   Therefore,   the   substitution   is 
correctness   preserving,  provided  that  q  >  0  before  (10)  and  p  > 
1  before  (11). 

To  complete  our  derivation,  we  would  like  to  eliminate  the 
variables  left  and  right.  To  achieve  this,  we  first  replace  the 
expressions  at  @2  and  (§4  by  expressions  which  reference  p  and  q 
instead  of  right  and  left.   First  of  all,  from  the  assertion 

left  =  i-q-p+l, 
it  can  be  shown  that 

right  <  left 
at  @2  is  equivalent  to 

q  <=  0, 
and  so  the  test  at  (?2  can  be  replaced  by  an  expression  involving 
only  q.   Similarly,  at  @4,  using  the  assertion 

right  =  i  +  p  -  1, 
it  can  be  shown  that 

right  <  left 
is  equivalent  to 

p  <=  1. 
Note  that  these  substitutions  ensure  that   q>0   is   true  before 
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(10),  and  p>l  is  true  before  (11). 

Having  eliminated  these  uses  of  left  and  right,  we  can 
delete  all  assignments  to  left  and  right  since  these  variables 
are  dead.   This  gives  the  following  final  version. 


(lA) 


proc   searchdeftO,    rightO)    returns    j; 


1=   sorted(a)    &   b    :    int      & 

m  >   2      &    leftO   =    1      &      righto   +    1    =    fib(m+l) 


X 

P 

q 


=    fib(in); 

=    fib(in  -    1 ) ; 

=    fib(m  -   2); 


(while    true) 


if  a(i)    =  b    then   return    true;    end; 
if  a(i)    >  b    then 

if   q    <=   0    then   quit;    end; 


1 

q 
p 


=   1    -   q; 

=  p  -  q; 
=  p  -  q; 


else 


if  p    <=    1    then  quit;    end; 


=    i   +   q; 

=  p  -  q; 
=  q  -  p; 


end; 

end; 

return    false; 

I-  j    <->    (E   leftO   <=  X    <=   righto    |    a(x)    =  b) 


end; 


Thus  we  have   obtained   a   verified   Fibonaccian   search 
algorithm  using  transformations  the  justifications  of  which  were 
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generally  quite  simple  and  local.  It  is  our  contention  that  the 
proof  of  the  algorithm  (14)  is  not  straightforward;  and,  in 
fact,  the  easiest  way  to  understand  and  verify  (14)  is  to  refer 
to  the  root  praa  form  and  the  relationship  between  p,  q,  left, 
and  right. 


CHAPTER  2 
Praa  Language 


In  this  chapter  we  define  our  praa  language,  which  is  a 
variant  of  SETL  [De79] ,  designed  to  facilitate  program 
verification.  The  expressions  of  this  language  are  the  terras 
and  formulae  of  set  theory.  The  distinction  between  the  syntax 
of  program  expressions  and  program  propositions  is  therefore 
minimal.  Note  that  we  have  omitted  certain  features  of  SETL, 
such  as  generalized  forms  of  assignment,  because  proof  rules  for 
these  Inessential  features  are  cumbersome. 

In  section  2.2  we  specify  a  set  of  program  transformation 
rules,  syntactic  in  nature,  which  define  some  of  the  more 
elaborate  language  constructs  in  terms  of  simpler  language 
constructs.  A  compiler  might  in  fact  use  such  rules  to 
decompose  a  source  program,  and  so  we  call  these  transformations 
compilation  transformations.  Later  we  will  use  these 
transformations  in  algorithm  derivations. 
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2.1  Language  Definition 

We  specify  the  praa  language  using  a  modified  form  of  BNF. 
Metasymbols  appear  in  the  grammar  as  upper  case  strings, 
terminals  as  lower  case  strings.  A  list  of  one  or  more  items  E 
is  indicated  by  El,..., En.  We  use  the  following  set  of 
metasymbols  which  occur  frequently  throughout  this  work  in  both 
syntax  and  transformation  rules. 


Metasymbol 

X 

EXP 

FM 

L 

SB 

QE 

PROP 

LHS 


Meaning 

Variable 

Expression 

Logical  Formula 

Label 

Statement  Block 

Quantifier  Expression 

Proposition 

Left  Hand  Side  of  Assignment 

Statement 


When  it  is  necessary  to  distinguish  between  different 
occurences  of  the  same  metasymbol,  we  add  numeric  suffixes  to 
the  metasymbols.  A  suffix  of  the  form  (X\EXP)  appended  to  an 
expression  or  a  formula  E  means:  replace  all  free  occurences  of 
the  variable  X  by  the  expression  EXP,  provided  that  all 
variables  which  are  free  in  EXP  are  also  free  in  E. 
2.1.1  Expressions  and  Formulae 


Expressions  and  formulae  appear  in  both  program  statements 
and  in  program  propositions.  Expressions  are  terms  which  denote 
sets,  tuples,  maps,  and  integers.   Maps  are  represented  as   sets 
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of  pairs,  and  tuples  are  interpreted  as  maps  with  integer 
domain.  Integers  have  their  usual  set-theoretic  interpretation; 
that  is  0  is  {},  1  is  {0},  2  is  {0,  1},  etc.  An  ordered  pair 
[x,  y]  is  interpreted  as  {{0,  {x}},  {2,  {y}}}.  A  formula  is 
interpreted  as  true  or  false. 

We  begin  with  the  syntax  of  expressions.   An  expression  EXP 
can  be  any  of  the  following. 

(1)  UNOP  EXP 

Unary  operators  include 


#  Cardinality 

-  Unary  minus 

pow  Power  set 

dom  Domain  of  a  map  or  tuple 

range  Range  of  a  map  or  a  tuple 

arb  Selection  of  an  arbitrary  element 

from  a  set. 

inverse  Map  inverse 

We  will  use  the  notation   f-inv  for  inverse  f. 


(2)  EXPl  BINOP  EXP2 

Binary  operations  include: 

+  Set  union  or  integer  addition 

I  I  Tuple  concatenation 

-  Integer  subtraction  or  set  difference 

*  Set  intersection  or  integer  multiplication 

div  Integer  division 

max  Maximum  of  two  integers 

min  Minimum  of  two  integers 

npow  Set  of  subsets  of  given  cardinality 

with  Addition  of  a  single  element  to  a  set  or 

tuple, 
less  Removal  of  a  single  element  from  set  or 

tuple. 
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(3)  restrict  (  EXPl,  EXP2  ) 

Expression  (3)  defines  the  restriction  of  the  map   EXPl   to 
the  set  EXP2. 

(4)  (  EXP  ) 

(5)  if  FM  then  EXPl  else  EXP2  end 

(6)  X 

Variable  identifier. 

(7)  INTEGER 
Integer  constant. 

(8)  {} 

The  null  set. 

(9)  [] 

The  null  tuple. 

(10)  BINOP  /  EXP 

The  result  of  (10)  is  the  result  of  performing  BINOP  over  all  of 

the   elements   of  EXP.    For  example,   if  BINOP  is  +,  then  the 
result  is  the  union  of  all  elements  in  EXP. 

(11)  EXPl  {EXP2,...,  EXPn} 

EXPl  is  a  multi-valued  map.   The  expression: 

EXPUEXP2} 
is  the  range  of   the   restriction   of  EXPl   to  EXP2 ,   and   the 
expression 

EXPHEXP2,  EXP3} 
is  equivalent  to 

EXP1{£XP2}{EXP3}. 
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(12)  EXP1(EXP2,.. . ,EXPn) 

EXPl  is  a  single-valued  map  or  a  tuple.  If  EXPl  is  a 
tuple,  we  allow  only  one  index  which  must  be  an  integer.  For 
maps  EXPl,  the  expression  form 

EXP1(EXP2) 
retrieves  the  range  value  for  EXP2,  and  for   general   maps,   the 
expression  form 

EXP1(EXP2,  EXP3) 
is  equivalent  to 

EXP1{EXP2}(EXP3). 

(13)  EXPl  [  EXP2,... ,  EXPn  ] 

The  expression  EXP1[EXP2]  yields  the  set  which  is  the  union 
of  range  values  for  the  elements  of  EXP2. 

(14)  EXPl  (  EXP2  :   EXP3  ) 

This  is  a  tuple  operation  which  extracts  a  subsection  of 
the  tuple  EXPl  defined  by  the  index  range  EXP2  through  EXP3. 
EXP2  and  EXP3  must  be  integer  valued.  If  EXP3  is  less  than 
EXP2,  or  the  subrange  specified  is  disjoint  from  the  range  of 
EXPl,  then  the  value  of  (14)  is  the  null  tuple. 
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(15)  [  EXPl,...,  EXPn  ] 

This  expression  forms  a  tuple  by   explicit   enumeration   of 
terms.   We  also  allow  formulae  as  tuple  elements. 

(16)  {  EXPl,...,  EXPn  } 

This  expression  forms  a  set  or  map  by  explicit   enumeration 
of  terras.   Formulae  may  also  be  elements  of  sets. 

(17)  [  FORMER  ] 

(18)  {  FORMER  } 

The  expressions  (17)  and  (18)  are  general   tuple  and   set 
former  expressions,  where  a  FOPJIER  can  be  any  of 

EXP  :  QEl,...,  QEn  |  FM 
EXP  :  QEl,.. . ,  QEn 
QE  I  FM 

A  quantified  variable  expression  QE  is  either  a  simple  variable 
X  or  a  range  expression  RANGEXP,  which  has  one  of  the  forms: 

X  in  EXP 
or 

[XI,..., Xn]  in  EXP 

where  EXP  is  a  set, 

EXPl  LESSOPl  X  LESS0P2  EXP2 , 

or 

EXPl  GTOPl    X  GT0P2    EXP2 . 
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LESSOP  includes  the  operators  <  and  <=,  and  GTOP  includes  the 
operators  >  and  >=.  For  example,  the  following  expressions  are 
valid  set  and  tuple  formers. 

{[x,  y+1]  :  X  in  s,  y  in  f(s)  |  y  >  x} 
[k  :  1  <=  k  <=  10] 

Note  that,  as  distinct  from  SETL,  we   allow   the   use   of  bound 

variables   for  which   no   explicit   range   is   specified.    For 

example,  the  expression 

{x  I  (Ay  in  s  |  f(x,  y))} 

is  allowed.   However,  general  set  former  expressions  must  adhere 

to   certain  well   defined  rules  to  avoid  Russell's  paradox.   We 

also  allow  formulae  to  denote   elements   or   range   elements   of 

sets,  so  that  the  following  set  formers  are  permitted. 


{ [x,  x  >  3]  :  X  in  s} 
{true,  false} 


Next  we  specify  the  syntax  of  formulae.   A  formula   FM  can 
be  any  of  the  following. 

(19)       EXPl  RELOP  EXP2 

The  set  of  relational  operators  RELOP  includes  the  following. 


=  Equality 

~=  Inequality 

in  Set  membership 

~in  Negation  of  set  membership 

>  Greater  than 

>=  Greater  than  or  equal 

<  Less  than 

<=  Less  than  or  equal 

subset  subset 
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psubset         Proper  subset 
incl  Inclusion 


(20)  not  FM 

(Also  written  as  ~  FM  ) 

(21)  FMl  BOOLOP  FM2 

The  boolean  operators  BOOLOP  include  the  following. 

&  Logical  and 

or  Logical  or 

->  Implication 

<->  Equivalence 


(22)  (  FM  ) 

(23)  if  FMl  then  FM2  else  FM3  end 

(24)  X 

A  simple  variable  may  be  assigned  the  value  true  or  false. 
Since  maps  may  be  defined  on  boolean  values,  F(X)  can  also  be  a 
formula. 

(25)  Boolean  Constants 

true 
false 

(26)  Quantified  Predicates 

E  QE 1 , . . . ,  QEn  |  FM 

E  QEl,. . . ,  QEn 

A  QEl,...,  QEn  |  FM 

The  symbol  E  is  used  for  the  existential  quantifier,  and  A   for 
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the  universal  quantifier. 

We  also  provide  a  facility  for  specifying  the  type  of  a 
variable.   The  general  form  of  a  type  predicate  is 

(27)  X  :   TYPE-SPEC 

If  X  satisfies  TYPE-SPEC,  then  the  value  of  (27)  is  true. 
TYPE-SPEC  can  be  any  of  the  following. 

(i)     int 
The  predicate  (27)  is  true  is  X  is  an  integer. 

(ii)     set   (  TEXP  ) 
TEXP  is  either  int  or  an  expression  EXP.   The  predicate  (27)   is 
true   if  X   is   a   subset   of  TEXP.   Note  that  TEXP  can  be  any 
general  expression. 

The  domain  specifier  (  TEXP  )  is  optional. 

(iii)  map  (  TEXPl  )  TEXP2 

The  predicate  (27)  is  true  if  X  is  a  map  whose  domain  is  a 
subset  of  TEXPl  and  whose  range  is  a  subset  of  TEXP2.  If  TEXPl 
is  a  pair,  then  TEXPl  denotes  a  cross  product.  TEXPl  and  TEXP2 
are  optional. 

(iv)     smap   (  TEXPl  )    TEXP2 
The  predicate  (27)  is  true  i f  X  is   a   single-valued  map  whose 
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domain   is   a   subset   of  TEXPl   and  whose  range  is  a  subset  of 
TEXP2. 

(v)    bmap   (  TEXPl  )    TEXP2 
This  form  is  similar  to  (iv).    The  keyword  bmap   specifies   a 
bijective  map. 

(vi)   tuple   (  TEXP  ) 
X  is  a  tuple  whose  range  is  a  subset  of  TEXP. 

Formulae  and  expressions  may  be  predefined  so  that  they  can 
subsequently  be  referred  to  by  name.  We  allow  expression 
definitions  of  the  form 

E(X1,...,  Xn)  =  EXP 
and  formula  definitions  of  the  form 

P(X1,...,  Xn)  <->  FM. 
The  variables  XI,...,  Xn  are  the  free  variables  of  EXP  and   FM. 
A  subsequent  occurrence  of  the  expression 

E(EXP1,.. .,EXPn) 
is  equivalent  to 

EXP(X1  \  EXPl,...,  Xn  \  EXPn) 
and  a  subsequent  appearance  of  the  formula 

P (EXPl,..., EXPn) 
is  equivalent  to 

FM(X1  \  EXPl Xn  \  EXPn). 
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Definitions  may  be  recursive.  However,  recursive  definitions 
must  be  well  defined  in  a  manner  which  is  described  in  section 
3.3. 

Note  that  this  definitional   facility   is   similar   to   the 
macro  facilities  available  in  many  programming  languages. 
2.1.2  Statements 

A  praa  is  a  sequence  of  statements  SB,  where  a  statement  is 
any  of  the  following. 

(1)  Assumption 
1=  Ftl 

(2)  Assertion 
I-  FM 

(3)  Assignment  Statement 
The  general  form  of  an  assignment  is 

LHS  :=  EXP; 
or 

LHS  :=  FM; 
The  value  of  EXP  or  FIl  is  assigned  to  the  left   hand   side   LHS. 
Possible   left   hand   side   forms   are   given   in   the  following 
assignments. 

(i)   Simple  Assignment 

X  :=  EXP; 
(ii)        Map  Assignment 

X{EXP1}  :=  EXP2; 
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After  this  assignment,  the  expression  X<EXP1>  yields  the  value 
EXP2. 

(iii)    Single-valued  Map  or  Tuple  Assignment 
X(EXPl)  :=  EXP2; 

(iv)        Parallel  Assignment 
[LHSl,.. ..LHSn  ]   :=  EXP; 
EXP  must  denote  an  n-tuple. 

(4)  From  Statement 

LHS  from  X  ; 
An  element  is  removed  from  X  and  assigned  to  LHS. 

(5)  Label 

L  : 

(6)  Branch  Statements 

We  provide  branch  statements  of  the  form 

go  to  L  ; 

if  FM  then  SB   end; 

if  FM  then  SBl  else  SB2   end; 

if  FMl  then  SBl  elseif  FM2  then  SB2...else  SBn  end; 

(7)  If  Exists  Statements 

For  each  if  statement  form  in  (6),  we  provide  an  alternative  if 
exists  form. 

if  exists  QEl,...,  QEn  |  FM  then  ... 

or 

if  exists  QEl,...,  QEn  then  ... 

An  if  exists  statement  has  the  side  effect  of  assigning  values 
to  the  bound  variables  if  the  condition  is  true.  For  example, 
the  statement: 
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if  exists  X  in  s  |  x  >  3  then  go  to  loop; 
assigns  a  value  to  x  which  is  in  s  and  which  is  greater   than   3 
if   such  an  element  exists.   If  the  bound  variable  expression  is 
of  the  form 

EXPl  LESSOP  X  LESSOP  EXP2 
then  the  variable  X  is  assigned  the  smallest   value  within   the 
specified   range;    if  we   replace  LESSOP   by  GTOP,  then  X  is 
assigned  the  largest  value  within  the  specified  range. 

(8)   While  Loop 

(  PROPl, .. . ,  PROPn 
while  FM   ) 

SB 
end; 

The  proposition  list  may  contain  zero  or   more   assumptions   and 

assertions.   The  control  dictions 

quit; 
and 

cont ; 
may  be  used  within  a  while  or  forall  loop  (see  (9)). 
We  also  provide  a  while  exists  form,  which  is  similar  to  the   if 
exists  statement. 


(PROPl,...,  PROPn 
while  exists  QEl,...,  QEn  |  FM  ) 

SB 
end ; 


(9)  Forall  loop 

(PROPl,.. .  ,  PROPn 

forall  RANGEXP   |  FM  ) 
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SB 
end; 

or 

(PROPl,...,    PROPn 
forall  RANGEXP   ) 

SB 
end; 

The  proposition  list  may   contain  0  or  more   assumptions   and 

assertions.    The  variable  bound  by   the  RANGEXP  is  assigned 

successive  values,   as   specified  by   the   range;    if  RANGEXP 

specifies   an  integer  range,  the  order  in  which  range  values  are 

assigned  to  the  bound  variable  is  determined  by   the   relational 

operator  in  the  obvious  manner.   Note  that  unlike  SETL,  we  allow 

only  a  single  range  expression  in  a  forall  loop.   Variables  used 

in  the  RANGEXP  must  not  be  modified  within  the  loop. 

A  program  aborts  if  an  ^ttempt  is  made   to  extract   an 
element  from  the  null  set.   For  example,  the  statement 

y  :=  arb  {x  :  x  in  s  |  x  ~=  x}; 

will  cause  the  program  to  abort.   Expressions  of  the  form 

max  /  EXP 
min  /  EXP 

and  statements   of   the    form 

LHS    from  EXP; 

will  also  cause  a  program  to  abort  if  EXP  is  null,   since   these 

forms  are  special  cases  of  element  selection. 


Praa  Language  PAGE  2-15 


2.1.3  Functions 

Functions,  which  may  be  recursive,  are  defined  as  follows. 

proc  F  (YI,...,  Yn)  returns  Zl,...,  Zm; 
1=  FMl 
SB 
i-  F142 
end; 

The  parameters  Yl,...,  Yn  are  the  input   parameters.    Variables 

Zl,...,  Zm  are  the  output  parameters;   the  value  of  the  function 

is  the  value  of  the  tuple 

[Z 1 , • . . ,  Zm] 
when  the   function   returns.    (If   there   is   only   one   return 
variable   Zl,   then  the  value  of  the  function  is  the  value  of  Zl 
when  the  function  returns.)  The  statement 

return; 
or 

return  EXPl,...,  EXPm; 
causes  the  function  to  return.   In  the  latter  case,   the   values 
of  EXPl,...,  EXPm  are  implicitly  assigned  to  Zl,...,  Zm. 

In  order  to  ensure  proof  rule  soundness,  functions  are  not 
allowed  to  have  side  effects.  Therefore,  formal  parameters  are 
not  allowed  to  appear  on  the  left  hand  side  of  any  assignment 
statement.  Global  variables  may  be  referenced,  but  not  modified 
within  a  function. 
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The  assumption  FMl  is  called  the  function's  input 
assumption.  The  free  variables  of  FMl  may  include  the  input 
parameters  Yl,...,  Yn  and  global  variables.  The  input 
assumption  therefore  specifies  properties  of  the  input 
parameters.  The  assertion  FM2  is  called  the  output  assertion. 
The  free  variables  of  FM2  may  include  the  input  parameters 
Yl,...,Yn,  global  variables,  and  the  output  variables  Zl,..., 
Zm.  The  output  assertion  therefore  specifies  properties  of  the 
output  variables  with  respect  to  the  input  parameters.  The 
function  F  is  partially  correct  if  whenever  the  input  assumption 
is  satisfied  at  the  entry  point,  the  output  assertion  is 
satisfied  when  the  function  returns. 

A  function  is  called  by  a  expression  of  the  form 

F(EXP1,...,  EXPn) 
If  F  is  partially  correct,  and  EXPl,...,  EXPn  satisfy   the   FMl, 
then  the  value  of  the  function  call  expression  satisfies  FM2. 
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2.2  Compilation  Transformations 

In  this  section  we  present  our  first  set  of  program 
transformations;  because  of  their  syntactic  nature,  we  call 
them  compilation  transformations.  The  semantics  of  more 
elaborate  dictions  in  our  language  are  defined  by  expanding  them 
into  dictions  of  a  core  language. 

Throughout  this  thesis,  program  transformations  are 
specified  as  rewrite  rules  of  the  form 

left-hand-pattern    =>   right-hand-pattern 
or 

left-hand-pattern    <=>   right-hand-pattern 
The  first  form  means  that  the  transformation  can  be   applied   in 
only   one   direction,   and   the   second   form  means   that   the 
transformation  is  bi-directional. 

Transformation  rules  for  indexed  and  parallel  assignment 
statements  are  given  below. 

(1)  Multi-valued  Map  Assignment 

X{EXP1}   :=  EXP2;      <=> 

X   :=   {[EXPl,  XI]  :  XI  in  EXP2}   + 

{[XI,  X2]  :[X1,  X2]  in  X  I  XI  ~=  EXPl}; 

(2)  Indexed  Assignment 
X(EXPl)   :=  EXP2;      <=> 

X  :=  {[EXPl,  EXP2]}  +  {[XI,  X2]  :[X1,  X2]  in  X  |  XI  ~=  EXPl}; 
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(3)  Parallel  Assignment 

[LHSl,  ...,  LHSn]   :=  EXP;    <=> 
(X  is  a  dead  variable) 

X  :=  EXP; 

LHSl   :=  X(l); 

LHS2   :=  X(2); 

LHSn      :=  X(n); 
A   similar   rule   can  be    formulated    for   a   statement   of   the    form 
[LHSl,    ....    LHSn]       :=  arb  EXP; 

(4)  From  statement 

LHS  from  X;   <=> 

LHS  :=  arb  X; 
X  :=  X  less  LHS; 

We  next  define  structured  control  statements   in   terms   of 
simple  branches  and  labels. 

(5)  IF-THEN 

if  FM  then  SB  end;   <=> 

(label  L  is  not  referenced  elsewhere) 

if  ~FM  then  go  to  L;  end;   SB  L: 

(6)  IF-THEN-ELSE 

if  FM  then  SBl  else  SB2  end;   <=> 

(LI  and  L2  are  not  referenced  elsewhere) 

if  "FM  then  go  to  LI;  end; 

SBl 

go  to  L2; 

LI:   SB2   L2: 

(7)  If  exists 
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if  exists  X  in  EXP  |  FM  then... 

<=> 
if  E  X  in  EXP  |  FM  then... 

X  :=  arb  {X  :  X  in  EXP  |  FM}; 


if  exists  EXPl  LESSOPl  X  LESS0P2  EXP2  |  FM... 

<=> 

if  E  EXPl  LESSOPl  X  LESS0P2  EXP2  |  FM. . . 

X  :=  min  /  {X  :  EXPl  LESSOPl  X  LESS0P2  EXP2  |  FM}; 


if  exists  EXPl  GTOPl  X  GT0P2  EXP2  |  FM... 

<=> 
if  E  EXPl  GTOPl  X  GT0P2  EXP2  |  FM. . . 

X  :=  max  /  {X  :  EXPl  GTOPl  X  GT0P2  EXP2  |  FM}; 

If  there  are  more  than  one  bound  variable  in  the  bound  variable 

list,   transformation   (7)  may  be  applied  repeatedly,  processing 

the  bound  variables  in  right  to  left  order. 

(8)   While-Loop 

(PROP  While  FM2)   SB  end;   <=> 

(LI  and  L2  are  not  referenced  elsewhere) 

LI:   PROP   if  ~FM2  then  go  to  L2;  end; 

SB 

go  to  LI ; 
L2: 

The  while  exists  rules  are  similar  to  the  3  if  exists  rules. 

(9)  forall-loop 

(PROP  forall   X  in  EXP  |  FM) 

SB 
end; 

<=> 

(Xexp  is  dead) 

Xexp  :=  {}; 

(PROP  while  exists  X  in  EXP  -  Xexp  |  FM) 

Xexp  :=  Xexp  with  X; 

SB 
end  while; 
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(PROP   forall  EXPl  LESSOPl  X  LESS0P2  EXP2  |  FM) 

SB 
end; 

<=> 

(LI  and  L2  are  not  referenced  elsewhere) 
if  LESSOPl  is  '<'  then 

X  :=  EXPl  +  1; 
else 

X  :=  EXPl; 
LI:  if  ~  X  LESS0P2  EXP2  then  go  to  L2;  end; 

if  FM  then  SB  end; 

X  :=  X  +  1; 

go  to  LI; 
L2: 


A  similar  compilation  rule  can  be  formulated  for   the   case 
in  which  GTOP  appears  in  the  header  instead  of  LESSOP. 

The  statement  quit  appearing  in  a  loop  expands  into 
go  to  L2; 
and  the  statement  cont  expands  into 
go  to  LI; 

We   next   state   compilation   transformations   for  various 
formulae  and  expressions. 

(10)  Bound  Variables 

We  first  list  various  transformations  that  can  be  performed 
on  bound  variables. 
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(i)    E  X  in  S  I  FM   <=>   E  X  |  X  in  S  &  FM 


(ii)   E  EXPl  LESSOPl  X  LESS0P2  EXP2  |  FM 
<=> 
E  X  I  EXPl  LESSOPl  X  &  X  LESS0P2  EXP2  &  FM 


(iii)   A  X  in  S  I  FM   <=>  A  X  |  X  in  S  ->  FM 


(iv)   A  EXPl  GTOPl  X  GT0P2  EXP2  |  FM 

<=> 
A  X  I  EXPl  GTOPl  X  &  X  GT0P2  EXP2  ->  F>1 

(v)  E  [XI,  X2]  in  EXP  |  FM 

<=> 
E  X  in  EXP,  XI,  X2  |  X  =  [XI,  X2]  &  FM 

(vi)         A  [XI,  X2]  in  EXP  |  FM 

<=> 
A  X  in  EXP,  XI,  X2  |  X  =  [XI,  X2]  ->   FM 


We  next  give  the  rules  for  expanding  set  and  tuple  former 
and  compound  operator  expressions,  which  are  implicit  loops, 
into  explicit  loops. 

(11)  Compound  Operator 

LHS  :=  BINOP  /  EXP;   => 

(XI,  X2,  and  X3  are  dead) 

XI  :=  EXP; 
X2  from  XI; 
( I-  X2  =  BINOP  /  (EXP  -  XI) 
while  //Xl  ~=  0) 

X3  from  XI; 

X2  :=  X2  BINOP  X3; 
end; 
LHS  :=  X2; 

Note  that  this  code  sequence  does  not  terminate  if  EXP  is  empty. 
A  loop  invariant  assertion  is  introduced. 


.tp  8 
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(12)  Set  Former 

LHS  :=  {EXP:  RANGEXPl , . . . ,  RANGEXPn  |  FM};   => 

(X  is  a  dead  variable) 

X  :=  {}; 

(  I-  fml 

forall  RANGEXPl) 

(I-  fm2 

forall  RANGEXP2) 


(I-  fmn 

forall  RANGEXPn  |  FM) 

X  :=  X  with  EXP; 

end  forall; 


end  forall; 
LHS  :=  X; 


The  formulae  frai  introduced  as  loop   invariants   depend  on 
the   forms   of   the   range   expressions  RANGEXPi,   1  <=  i  <=  n. 
RANGEXPi  has  one  of  the  following  forms, 
(i)      Xi  in  EXPi 

(ii)     EXPli  LESSOPli  Xi  LESS0P2i   EXP2i 
(iii)     EXPIi  GTOPli   Xi  GT0P2i   EXP2i 

We  define  new  range  expressions  so  far (RANGEXPi)  which  will  be 
part  of  the  loop  assertions.  If  RANGEXPi  has  form  (i),  then 
sofar (RANGEXPi)  is 

Ii  in  (EXPi  -  Xexpi) 
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li  is  a  variable  which  is  dead,  and  Xexpi  corresponds  to 
the  variable  introduced  in  transformation  (8).  If  RANGEXPi  has 
form  (ii),  then  so  far (RANGEXPi)  is 

EXPli  LESSOPi  Ii  <  Xi 

Finally,  if  RANGEXPi  has  form  (iii),  then  so  far (RANGEXPi) 
is 

EXPli  GTOPi  Ii  >  Xi. 

We  can  now  specify  the  loop  assertions  generated  by  (12). 
The  assertion  fml  has  the  form 

X  =  {EXP(X1\I1)  :  so  far (RANGEXPI),  RANGEXP2, . . .RANGEXPn  |  FM} 
Assertion  fm2  has  the  form 


X  =  {EXP(X1\I1)  :  so  far (RANGEXPI),  RANGEXP2, .. .RANGEXPn  |  FM} 
+ 

{EXP(X2\I2)  :  sofar(RANGEXP2),  RANGEXP3, .. .RANGEXPn  |  FM} 

and  so  on  up  to  assertion  fmn  which  has  the  form 

X  =  {EXP(X1\II)  :  so  far (RANGEXPI),  RANGEXP2 ,.. .RANGEXPn  |  FM} 
+ 

{EXP(X2\I2)  :  sofar(RANGEXP2),  RANGEXP3, .. .RANGEXPn  |  FM} 
+ 

. . .  + 

{EXP(Xn\In)  :  so  far (RANGEXPn)  |  FM} 


A   similar   rule   can  be   formulated   for   tuple    former 
expressions. 


•  tp  8 
(13)   IF-THEN-ELSE  expressions 
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LHS  :=  if  FM  then  EXPl  else  EXP2  end;   <=> 

if  FM  then 

LHS  :=  EXPl; 
else 

LHS  :=  EXP2; 

end; 


CHAPTER  3 
Program  Proof  Technique 


In  this  chapter  we  describe  our  method  for  verifying 
assertions  at  specified  program  locations.  Briefly,  our  proof 
technique  is  the  following.  Suppose  we  wish  to  prove  that  a 
proposition  P  is  true  at  a  program  place  @.  In  general,  we  must 
supply  and  verify  additional  propositions,  which  serve  as 
hypotheses  in  the  proof  of  P.  Once  these  annotations  are 
complete,  relevant  program  segments  are  compiled  into  logical 
formulae,  called  verification  conditions.  We  then  attempt  to 
prove  these  verification  conditions  by  interacting»/ith  a  proof 
checker.  If  we  can  prove  the  verification  conditions,  then  we 
can  make  P  an  assertion  at  @. 

In  the  process  of  algorithm  derivation,  it  is  necessary  to 
use  the  proof  mechanism  both  to  verify  the  initial  praa,  and  to 
verify  subsequent  program  transformations.  At  each  stage  the 
partial  correctness  property  of  the  praa  is  maintained. 
Initially,  a  praa  which  has  not  yet  been  verified  contains  only 
assumptions  -  typically  an  input  and  an  output  assumption,  and 
an  assumption  breaking  each  program  loop.    (Note   that   a  praa 
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that  contains  assumptions  and  no  assertions  is  always  partially 
correct.)  Various  proof  rules  are  then  applied  with  the  ultimate 
goal  of  replacing  assumptions  by  assertions.  The  praa  is 
considered  verified  when  the  only  assumption  remaining  is  the 
input  assumption.  Program  transformations  introduce  additional 
assumptions  into  the  text.  These  assumptions  are  verified  by 
the  same  proof  technique. 

Specifically,  our   proof  system  provides   the   following: 
capabilities: 

(i)  the  Introduction  of  new  assumptions  into  a  praa; 

(11)  the  elimination  of  assertions  from  a  praa; 

(ill)  the  compilation  of  praa   segments   into  verification 
conditions;   and 

(Iv)  the  proof  of  verification   conditions,   which   enables 
the  replacement  of  assumptions  by  assertions. 

The  logical   soundness   of  our   system   follows   from  the 
soundness  of  the  proof  rules  given  in  Schwartz  [DS77] . 

3.1  Assumption  and  Assertion  Rules 

We    first   state   a   set   of    correctness-preserving 
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transformation  rules  involving  assumptions  and  assertions. 
Assumption  rules  allow  assumption  strengthening,  assumption 
splitting  and  joining,  and  assumption  interchange. 

(1)  Assumption  Splitting 

1=  FMl   &   FM2      <=>     1=   FMl 

1=   FM2 

(2)  Assumption  Interchange 

1=  FMl  <=>     1=  FM2 

1=  FM2  1=  FMl 

(3)  Assumption  Strengthening 

empty  =>       | =  FM 

Assertion  rules,  which  are  similar  to  assumption  rules,  allow 
assertion  weakening,  splitting,  joining,  and  interchange  with 
other  assertions. 

(4)  Assertion  Splitting 

I-   FJll    &    FM2  <=>  I-      FMl 

I-    mil 

(5)  Assertion  Interchange 

I-  RU  I-   FM2 

I-   FM2        <=>      I-   FMl 

(6)  Assertion  Weakening 

|-  FM  =>        empty 

(7)  Assumption  Degradation 

(a   1=   FM  =>  I-   FM 

An  assumption  can  be  changed  into  an  assertion  provided  that  the 
verification   conditions  generated  for  FM  at  (?  are  provable.   (A 
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detailed  definition  of  these  verification  conditions  is  given  in 
section  3.2  below.)  The  assumption  degradation  rule  is  based  on 
the  induction  principles  described  in  Schwartz  [DS77] . 

(8)   Valid  Assertion  Introduction 
As  a  special  case  of  (7),   any  valid   logical   formula   can  be 
introduced  at  any  place  in  a  praa. 

empty  =>        |-  FM 

provided  that  FM  is  provable. 

In  the  next  section,  we  state  a  procedure  for  generating 
verification  conditions.  In  section  3.3  we  discuss  set- 
theoretic  proof  techniques  for  proving  verification  conditions. 

3.2  Generation  of  Verification  Conditions 

A  verification  condition  reflects  part  of  the  semantics  of 
a  praa  text,  namely  that  part  relevant  to  the  portion  of  a  praa 
extending  from  a  place  @'  to  @,  where  @'  precedes  @  in  the 
control  flow  sequence.  In  the  following  discussion,  the  place 
@'  will  be  called  a  starting  place.  The  number  of  verification 
conditions  generated  for  P  at  @  depends  on  the  number  of 
distinct  simple  paths  from  starting  places  to  (?. 

A  verification  condition  VC  which  we  generate  for  an 
assumption  P  at  a  place  @  in  praa  R  must  satisfy  the  following 
criterion.   If  all   VC   are   valid  and  P   is   degraded   to   an 
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assertion,  then  R  remains  partially  correct.  In  order  to 
satisfy  this  criterion,  the  set  of  starting  places  S  must  obey 
the  following  rules: 

(i)  For  all  @'  in  S ,  there  is  an  execution  path  from  @'   to 

(ii)  For  all  (?'  in  S ,  if  there  is  an  execution  path  from  (?' 
to  @  which  contains  a  cycle  C,  then  C  contains  a  starting  place. 

(iii)  If  there  is  a  path  from  the  entry  point  of  R  to  @ 
which  does  not  pass  through  a  starting  place,  then  the  entry 
Doint  is  a  starting  place. 

For  every  starting  place  @',  and  for  every  simple  path  from 
@'  to  (3  which  does  not  pass  another  starting  place,  we  generate 
a  verification  condition.  (An  additional  verification  condition 
is  generated  for  each  function  call  occurring  on  such  a  path  as 
described  below.  We  assume  in  what  follows  that  the  places  @ 
and  ^'  are  in  the  same  procedure.) 

We  give  several  examples  to  illustrate  these  rules. 
Suppose  R  is  a  praa  of  the  form 

@0       1=  FM 

if  FMl  then 

SBl 
else 

SB2 
end; 
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@1      SB3 

@  1=  FM2 
The  set  {@0}  satisfies  (i)  -  (iii)  since  there  is  a  path  from  @0 
to  @,  and  all  paths  from  @0  to  @  are  cycle  free.  Two 
verification  conditions  will  be  generated  because  of  the  if 
statement.  The  set  {(?1}  also  satisfies  (i)  -  (iii).  In  this 
case,  only  one  verification  condition  will  be  generated  since 
SB3  is  a  straight  line  code  sequence.  As  a  second  example, 
consider  the  praa  form: 

(30   1=  FM 

SBl 
@1   (while  FMl) 

SB2 
@2      (while  FM2) 
SB3 
end; 

if  FM3  then  quit;  end; 
end; 
SB4 

@    I =   FM4 

The  sets  {(30,  (52}  and  {@1,  Q2}   each  satisfy  conditions   (i)   - 

(ill).    In  each   case,   two  verification   conditions  will  be 

generated.   Note  that  the  set  {@0,  @1}  does  not  because  there  is 

a  loop  along  a  path  from  (31  to  @. 

In  the  standard  Floyd/Hoare  verification  systems,  the 
verification  compiler  uses  all  the  program  propositions  as 
starting  places.  Since  in  these  systems  all  loops  must  be 
annotated  with  an  invariant,  condition  (ii)  is  guaranteed.  In 
our  system  we  will  also  generally  use  program  propositions   as 
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starting  places.  However,  because  we  are  less  restrictive  about 
the  placement  of  propositions,  we  give  the  user  the  freedom  of 
specifying  different  starting  places. 

Suppose  (?'  is  a  starting  place,  and  suppose  (?'  =  @1,  @2, 
...,  i?n  =  @  is  a  simple  path  from  @'  to  @  such  that  @2,...,  @n-l 
are  not  starting  places.  Then  we  generate  a  verification 
condition  as  follows.  The  statements  at  (31,...,  (3n-l  are 
transformed  into  hypotheses  of  the  verification  condition  in  a 
manner  depending  on  the  form  of  the  statement  as  explained  in 
more  detail  below.  The  assumption  at  @  is  transformed  into  the 
conclusion  of  the  verification  condition. 

To  see  how  the  statements  at  (?l,..,(?n-l  must  be  handled, 
note  that  an  assignment  statement  at  @i  changes  the  value  of  a 
variable  which  may  have  already  been  used  at  (5j ,  j  <  i.  Since 
the  verification  condition  is  a  logical  formula,  we  cannot  use 
the  same  variable  name  to  denote  different  values.  To  cope  with 
this  small  technical  difficulty,  we  simply  generate  a  new  name 
to  denote  the  new  value  of  a  variable  whenever  we  process  an 
assignment  statement,  and  maintain  a  map  current-names,  which 
maps  a  program  variable  into  the  name  denoting  its  current 
value.  We  replace  subsequent  occurrences  of  the  variable  by  its 
current  name.  Initially,  for  all  free  variables  X  in  R, 
current-names (X)  =  X.  We  collect  the  hypotheses  generated  in  a 
set  hypotheses,  initially  empty.  Then  for  each  location  (3i,  1 
<=   i   <=  n-1,   processed   in   sequential  order,  we  perform  the 
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following. 

(We  assume   Initially   that  expressions   do   not   contain 

embedded  function  calls.)  Take  an  action  (as  specified  in  detail 

below)  depending  on  the  statement  type  at  (?i.   We  will   use   the 
notation 

FM  \  current 
to  mean:     replace   all    free   variables   X    in   FM   by 
current-names (X) . 

(1)  Assumption  or  Assertion 
1=  FM     or      I-  FM 

In  this  case  add  the  formula 

FM  \  current 
to  the  set  hypotheses. 

(2)  Simple  Assignment 
X  :=  EXP; 

Let  Xnew  be  a  new  name  generated  for  X.   To  hypotheses  we  add 
the  formula: 

Xnew  =  EXP  \  current 
and  set  current-names (X)  to  Xnew. 

(3)  Selection  Assignment 
X  :=  arb  EXP; 

Generate  a  new  name  Xnew  and  add  the  formula 
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X  in  EXP  \  current 
to  hypotheses,  and  set  current-names (X)  to  Xnew. 

(4)  Parallel  Assignment 

[Yl,  ...,  Ym]  :=  EXP; 
Generate  variables  Ylnew,...,  Ymnew.   We  add  the  hypothesis: 

[YInew, . . . ,Ymnew]  =  EXP  \  current 
to  hypotheses,  and  update  current-names  for  Yl,...,Ym. 

(5)  Parallel  Selection  Assignment 

[Yl,.. . ,  Ym]  :=  arb  EXP; 
Generate  variables  Ylnew,...,  Ymnew  and  we  add  the  formula 

[Ylnew, ... ,Ymnew]  in  EXP  \  current 
to  hypotheses.   Also,  update   current-names   for   the  variables 
Yl,..  .  ,  Ym. 

(6)  Function  Call 

Suppose  F  is  a  function  which  is  declared  by  the  statement 

proc  F(Yl,...,Yj)  returns  Zl,...,  Zm; 
and  suppose  that  FMl  is  the  input   assumption  and   FM2   is   the 
output  assertion  of  F.   Consider  the  following  function  call. 

(*)     [Wl,.,.,Wm]  :=  F(EXP1,...,  EXPj); 
To  handle  this  statement,  we  must  generate   a  hypothesis  which 
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states  that  the  variables  Wl , . . ,Wm  satisfy  FM2.  However,  rM2 
will  be  true  after  the  function  call  only  if  the  input 
assumption  is  satisfied  before  the  function  call.  Therefore,  we 
must  generate  an  additional  verification  condition  at  this  point 
which  states  that  the  input  assumption  of  the  function  is 
satisfied;  the  hypotheses  of  this  additional  verification 
condition  are  the  formulae  stored  in  hypotheses  so  far. 
Specifically,  the  verification  condition  generated  for  the 
function  call  (*)  is: 

hypotheses   -> 

FMKYl  \  EXPl,...Yj  \  EXPj)  \  current 

Then  generate  new  names  Wlnew, . . . ,Wmnew,  and  add   the   following 

formula  to  hypotheses: 

FM2(Z1  \  Wlnew,...,  Zm  \  Wmnew, 

Yl  \  EXPl,  ...,  Yj  \  EXPj)  \  current 

Finally,  update  the  current  names  of  Wl,...,Wm. 

(7)  If  statement 

if  FM  then  . . . 
If  the  statement  at  @i+l  is  the  statement  which  is  executed  when 
FM  is  true,  then  we  add  the  formula 

FM  \  current 
to  hypotheses.   If  the   statement   at   (31+1   is   the   statement 
executed  when  FM  is  false,  then  we  add  the  formula 

"FM  \  current 
to  hypotheses. 
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(8)  While  Loop  Header 

(while  FM) 
If  the  statement  at  @i+l  is  the  statement  executed  when  FM  is 
true,  then  we  add  the  formula 

FM  \  current 
to  hypotheses.   Otherwise,  we  add  the  formula 

~FM  \  current 
to  hypotheses. 

(9)  'If  exists'  Statement 

if  exists  Yl,  ...,  Ym  |  FM  ... 

The  if  exists  statement  implicitly  assigns  values  to  variables 
Yl,...,Yra  if  the  statement  condition  is  true.  Therefore,  if  the 
statement  at  (?i+l  is  the  statement  which  is  executed  when  the 
condition  is  true,  we  generate  new  variables  Ylnew, . . . ,Ymnew, 
and  add  the  following  formula  to  current-names: 

FM(Y1  \  Ylnew, ... ,Ym  \  Ymnew)  \  current 
The  current-names  of  Yl,...,  Ym  are   updated.    (Note   that   the 
bounded  quantifier  form: 

if  exists  Y  in  S  1  FM  . . . 
is  equivalent  to 

if  exists  Y  I  Y  in  S  &  FM  ... 
and  is  therefore  covered  by  the  rule  (9).) 
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As  an  example  of  a  particular   form  of   the   "if-exists' 
statement,  we  consider  the  statement  form: 

if  exists  [Yl,...,Ym]  in  S  |  FM. . . 
which  implicitly  assigns   values   to  variables  Yl,...,Ym.   As 
above  we  generate  variables  Ylnew, . . . ,Ymnew.   We  then  let  FM'  be 
the  formula 

[Ylnew, ... ,Ymnew]  in  S  &  FM(Y1  \  Ylnew,..., Ym  \  Ymnew) 
and  add  to  hypotheses  the  formula 

FM'  \  current 
Finally,  we  update  the  current-names  of  Yl,...,Ym. 

As  another  example,  let  us  consider  the  bounded   quantifier 
form 

if  exists  EXPl  <=  Y  <=  EXP2  |  FM  . . . 
In  this  case,  if  the  statement  condition  is  true,  Y  is   assigned 
the  minimum  value  in  the  range  from  EXPI  to  EXP2  which  satisfies 
FM.   We  therefore  generate  a  new  variable  Ynew,  and  let   FM'   be 
the  formula 

EXPl  <=  Ynew  <=  EXP2  &  FM(Y  \  Ynew)   & 
Ynew  =  min  /  {EXPl  <=  Y  <=  EXP2  |  FM} 

Then  we  add  the  formula 

FM'  \  current 
to  hypotheses.   (A  similar  rule  can  be   formulated   for  bounded 

quantifiers  of  the  form 

EXPl  >=  Y  >=  EXP2.) 
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So  far  we  have  only  considered  hypotheses  generated  when 
the  "if  exists'  condition  is  true.  If  the  statement  at  @i+l  is 
the  statement  which  is  executed  when  the  if  condition  FM  is 
false,  then  there  is  no  implicit  assignment,  and  the  semantics 
of  the  if  exists  statement  are  identical  to  the  semantics  of  the 
if  statement  in  rule  (8). 

(10)  While  exists  Loop  Header 

(while  exists  Yl,...Ym  |  FM) 
Hypotheses  are  generated  as  in  rule  (9). 

(11)  Forall  loop  Header 

We  first  consider  the  form 

( I-  FMl 
forall  X  in  S  I  FM) 

An   implicit   variable   S-loopvar   is   associated   with   every 

forall-loop  of  this  form,  and  the  above  header  is  equivalent  to: 

S-loopvar  :=  {}; 
(I-  FMl 

while  exists  X  in  S  -  S-loopvar  |  Fid) 
S-loopvar  :=  S-loopvar  +  {X}; 

We  generate  hypotheses  for   the   forall-loop   as   if  this   code 

sequence  had  appeared  in  place  of  the  actual  forall  header. 

We  have  assumed  in  rules  (1)  -  (11)  that  expressions  are 
free   of  embedded   function   calls.   If  this  is  not  the  case  we 
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must  modify  our  rules,  as   the   following  example   illustrates. 
Suppose  the  statement  at  @i  is: 

X  :=  {y  +  f(z)  :  z  in  s}; 
where  f  is  declared  by  the  statement: 

proc  f(wl)  returns  w2 ; 
and  has  input  assumption   FMl   and  output   assertion  FM2.    To 
verify   that   the   input   assumption   of  f  is  satisfied,  we  must 
generate  the  verification  condition: 

hypotheses  ->  A  z  in  s  |  FMi(wl  \  z) 
We  then  generate  a  new  temporary  variable  T,  and  a  new  variable 
xnew,  and  add  to  hypotheses  the  formula: 

AvinT    I    Ezins,    tl    |    v=y+tl&   FM2(wl\z,    w2\tl) 
&        xnew   =  T 

A  general  rule  which  accomplishes  this  is  as  follows. 
Suppose  a  function  F  is  declared  as  in  (6),  and  a  function  call 
of  the  form 

F(EXPl,...,EXPj) 
occurs   within  an   expression   or   formula   of  an   executable 
statement   S.   First,  suppose  all  variables  of  EXPl , . . . ,EXPj  are 
free  in  the   expression   containing   the   function   call.    (For 
example  in  the  expression: 

{x  in  s  I  f(y,  4)  =  x} 
the  parameter  y  in  the   function   call   f(y,   4)   is   free.)   We 
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perform  the  actions  below. 

(i)  First,  we  generate  a  verification  condition  to 
guarantee  that  the  input  assumption  of  the  function  F  is 
available  before  S.  The  verification  condition  generated  is  the 
same  as  described  in  (6). 

(ii)  Next,  we  generate  new  temporary  variables  Tl,...,Tm, 
and  replace  the  function  call  in  S  by  [Tl,...,Tm],  giving  S'. 
We  generate  a  hypothesis  for  S'  according  to  the  above  rules  (1) 
-  (11). 

(iii)  We  then  add  the  follovzing  formula  to  hypotheses. 


FM2(Z1  \  Tl,...,Zm  \  Tm,  Yl  \  EXPl,  ...,  Yj  \  EXP j )  \ 

current 


Next,  suppose  a  variable  V  used  in  the  function  call   is   a 
bound  variable.   (For  example,  in  the  expression 

{x  in  s  I  f(x,  4)  =  y} 
the  parameter  x  is  bound.)  We  will  assume  that  the  function  call 
occurs  within  a  set  former  of  the  form  : 

(12)        {SEXP  :  V  in  W  I  FM} 
(If  this  is  not  the  case,  then   the   expression   containing   the 
function   call   can  be  transformed  into  an  expression  with  a  set 
former.)  We  perform  the  following  actions. 
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(i)  First,  we  generate  a  verification  condition   to   verify 
the  input  assumption  of  F  as  follows.   Let  FM'  be 

A  V  in  W  I  FMKYl  \  EXPl,...,  Yj  \  EXPj ) . 
We  then  generate  a  verification  condition 


hypotheses  -> 
FM'  \  current 


(ii)  We  generate  a  variable  T  to  denote  the  value  of  the 
set  expression  (12),  and  replace  (12)  by  T  in  S,  giving  S'.  We 
then  generate  a  hypothesis  for  S'  using  rules  (1)  -  (11). 

(iii)  As  above,  we  generate  Tl,..,Tm  to  denote  the  values 
returned  by  F.   We  then  let  FM'  bo  the  formula: 

(14)    A  U  in  T  i  E  V  in  W  I  U  =  SEXP  &  FM  & 

FM2(Z1  \  Tl,...,Zm  \  Tm,  Yl  \  EXPl,...,Yj  \  EXPj) 

We  replace  the  ooccurrence  of   the   function   call   in  FM'   by 

[Tl,...,Tm],  and  form  the  hypothesis 

FM'  \  current 

Once  all  the  statements  along  the  path  from  @l...@n-l  are 
processed,  we  have  a  complete  set  of  hypotheses  for  the 
verification  condition.  The  statement  at  @n  is  an  assumption, 
which  we  transform  into  a  conclusion,  using  rule  (1),  and 
finally  generate  the  verification  condition  for  the  path  from  (?' 
to  @. 
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As  an  example  of  the  use  of  these  rules,  consider  the 
search  praa  (1)  in  section  1.4.  We  use  the  praa  entry  point  as 
the  starting  place,  and  generate  six  verification  conditions 
since  there  are  four  distinct  paths  to  an  exit  place  and  two 
function  calls.   The  verification  conditions  are  as  follows: 


(VCl)  sorted(a)    &   b    :  int      & 

right   <   left      &  jnew   <->    false 
-> 

jnew   <->    (E   left  <=  x   <=   right    |    a(x)    =  b) 

(VC2)    sorted(a)   &   b  :  int   & 

right  >  left   &   inew  in  {left  <=  k  <=  right} 
&   a(inew)  =  b   &   jnew  <->  true 

-> 
jnew  <->  (E  left  <=  x  <=  right  |  a(x)  =  b) 

(VC3)    sorted(a)   &  b  :  int   & 

right  >  left   &   inew  in  {left  <=  k  <=  right}   & 
a (inew)  ~=  b   &  a (inew)  >  b 

-> 

sorted(a)   &  b  :  int 

(VC4)    sorted(a)   &   b  :  int   & 

right  >  left   &  inew  in  {left  <=  k  <=  right}   & 

a(inew)  "=  b   &  a(inew)  >  b   & 

jnew  <->  (E  left  <=  x  <=  inew  -  1  |  a(x)  =  b) 

-> 
jnew  <->  (E  left  <=  x  <=  right  |  a(x)  =  b) 

(VC5)    sorted(a)   &  b  :  int   & 

right  >  left  & 

inew  in  {left  <=  k  <=  right}   & 

a(inew)  "■=  b  &   a  (inew)  <=  b 

-> 

sorted(a)   &  b  :  int 

(VC6)    sorted(a)   &   b  :  int 

&  right  >  left   &   inew  in  {left  <=  k  <=  right}   & 

a(inew)  "=  b   &   a(inew)  <=  b   & 

jnev7  <->  (E  inew  +  1  <=  x  <=  right  |  a(x)  =  b) 

-> 
jnew  <->  (E  left  <=  x  <=  right  |  a(x)  =  b) 


Program  Proof  Technique  PAGE  3-18 

3-3  Remarks  on  Praa  Termination 

A  praa  will  terminate  unless  (i)  there  is  an  infinite  loop; 
(ii)  there  is  a  recursive  function  or  group  of  mutually 
recursive  functions  which  does  not  terminate,  or  (ill)  an 
attempt  is  made  to  extract  an  element  from  the  null  set,  in 
which  case  the  praa  aborts.  To  justify  some  of  the 
transformations  to  be  considered  below,  we  will  need  to  show 
termination,  i.e.,  need  to  show  that  none  of  (i),  (ii),  or  (iii) 
are  possible.   How  can  this  be  done. 

First  consider  (iii).  To  show  that  a  praa  does  not  abort, 
we  only  need  to  show  that  the  assertion 

EXP  "=  {} 
is  available  before  every  expression  of  the  form: 


arb  EXP 
BINOP  /  EXP 
LHS  from  EXP; 


Next  consider  (i).  To  show  that  looping  is  impossible,  we 
must  show  that  no  while  loop  or  backwards  branch  statement 
results  in  infinite  loops.  (Note  that  a  forall  loop  over  a 
finite  set  always  terminates.)  For  these  'dangerous'  control 
constructions,  we  can  use  the  standard  technique  for  proving 
termination.  For  backwards  branch  constructs,  we  must  show  that 
there  is  a  function  g(Xl , . . . ,Xn) ,  where  XI,..., Xn  are  program 
variables,   such   that   the  range  of  g  is  well-ordered,  and  such 
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that  g(Xl,...,Xn)  decreases  at  each  loop  iteration. 

For  every  backwards  branch 

(3   if  FM  then  go  to  L;  end; 
let  XI,..., Xk  be  the  variables  modified  between   the   definition 
of  L  and  @,  and  XI',..., Xk'  be  new  variables.   If  we  insert  the 
assignments : 

XI'  :=  XI; 

Xk'  :=  Xk; 
after  the  definition  of  L,  we  must  show  that  there  is  a  function 
g  such  that  the  assertion 

0  <=  g(Xl,...,Xk)  <  g(Xl', ... ,Xk') 
is  available  at  (?. 

We  use  a  similar  technique  to  show  that  recursive  functions 
terminate.  Namely,  if  F  is  a  function  with  input  parameters 
XI,..., Xk,  for  every  recursive  call  of  the  form: 

F(EXP1,... ,EXPk) 
we  must  show  there  is  a  function  g  such  that  the  assertion 

0  <=  g(EXPl,...,EXPk)  <=  g(Xl,...,Xk) 
is  available  before  the  recursive  call. 

3.4  The  Set  Theory  Proof  Checker  Component 
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In  this  section  we  outline  the  general  capabilities  of  a 
set  theory  proof  checker,  assumed  to  be  available  as  a  major 
component  of  our  hypothetical  transformation  system;  a  detailed 
specification  of  all  the  features  of  such  a  proof  checker  is 
beyond  the  scope  of  this  thesis.  First,  some  comments  on  the 
oresent  state  of  the  art  in  this  regard.  Existing  proof 
checkers  having  some  degree  of  set-theoretic  capability  include 
the  systems  of  Bledsoe  [BlBr74] ,  Suppes'  EXCHEK  system  [AST76] , 
which  is  used  for  computer  aided  instruction,  and  the  still  very 
fragmentary  system  of  Schwartz  [Schw79] ,  which  is  currently 
being  developed.  We  now  review  these  theorem  proving 
techniques,  partly  to  enable  us  to  estimate  the  number  of  steps 
required  to  verify  the  case  study  to  be  presented  in  Chapter  5, 
but  also  to  convey  a  general  understanding  of  what  is  involved 
in  proof  checking. 

We  assume  that  the  proof  checker  to  be  used  with  our 
hypothetical  program  verification  system  would  provide  a 
convenient  overall  framework  within  which  the  user  can  guide  a 
nroof  while  the  system  verifies  that  each  step  is  sound.  To 
prove  a  theorem,  the  user  will  first  enter  a  list  of  hypotheses 
or  assumptions.  From  these,  assertions  can  then  be  obtained  by 
applying  rules  of  inference.  Each  primitive  command  to  the 
system  applies  a  rule  of  inference  to  existing  assumptions  and 
assertions.  In  the  "natural  deduction'  framework  which  we 
assume  our  verifier  will  provide,  it  will  then  be  noted  that  the 
assertions  obtained  is  "conditional'  on  the  set   of  assumptions 
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used  to  derive  it.  Subsequently,  if  an  assertion  S  is 
conditional  on  assumptions  SI,...,  Sk,  then 

SI  &  S2  ...  &  Sk  ->   S 
is  a  theorem  which  can  be  obtained  by  application  of  a   general 
rule  for  "discharging  assumptions'. 

The  verifier  would  maintain  a  library  of  theorems  whose 
free  variables  could  be  replaced  by  arbitrary  expressions  and 
formulae,  when  the  theorems  were  applied  to  deduce  new  theorems. 

Among  the  various  rules  of  inference  which  cound  be 
included  in  a  verifier  of  the  sort  we  have  in  mind,  the 
following  are  fundamental:  (We  abbreviate  many  of  these  rules 
in  the  form 

El     »   E2 

which  means  that  if  El  is  a  list  of  assumptions  and  assertions, 
we  can  add  E2  as  an  assertion  which  depends  on  the  assumptions 
in  El.) 

(1)  Definition  Expansion 

If  a  name  D  has  been  predefined  as 

D(X1,... ,Xn)  =  EXP 

or 

D(Xl,...,Xn)  <->  FM 

then  the  term 
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D(EXPl,...,EXPn) 
can  be  replaced  by 

EXP (XI  \  EXPl,...,  Xn  \  EXPn) 
or 

FM(X1  \  EXPl,...,  Xn  \  EXPn). 

(2)  Universal  Quantifier  Introduction  and  Elimination 

The  universal  quantifier  elimination  rule  is: 

(A  X  I  FM)     »    FM(X  \  EXP) 

and  the  univeral  quantifier  introduction  rule  is: 

FM   »    A  X  I  FM  (Y  \  X) 
provided  that  the  variable  Y   is   not   used  elsewhere   in  an 
assumption  or  assertion,  and  X  is  not  free  in  FM. 

(3)  Existential  Quantifier  Introduction  and  Elimination 

The  elimination  rule  is: 

E  X  I  FM    »     FM(X  \  XO) 
where  XO  is  a  new  variable. 
The  introduction  rule  is: 

FM  »        E  X  I  FM(EXP  \  X) 

where  X  is  not  free  in  FM,  and  EXP  is  any  term. 
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(4)  Decision  Procedures 

In  a  proof  verifier  of  the  sort  we  envisage,  decision 
algorithms  for  various  sublanguages  of  our  praa  language  will 
provide  the  means  for  determining  the  validity  of  formulae  in 
the  sublanguage  in  a  single  proof  checker  step.  Moreover,  the 
technique  of  Nelson  and  Oppen  [N078]  will  allow  us  to  combine 
decision  procedures  for  sublanguages  with  disjoint  sets  of 
non-logical  symbols.  Decision  algorithms  for  the  following 
sublanguages  are  available  (see  [Schw78] ,  [FOS79] ) . 


Prepositional  Calculus 


Elementary  Boolean  Theory 
of  Sets 

Pressberger  Arithmetic 


Multilevel  Syllogistic 


propositional  connectives, 

equality, 

uninterpreted  function  symbols. 

set  union,  intersection, 
difference,  and  subset. 

integer  addition,  subtraction 
inequality,  quantification  over 
integer  variables. 

set  union,  intersection, 
difference , 
subset,  membership. 


Extensions  of  Multilevel 
Syllogistic 


singleton,  #, 

+,  -,  <=,  domain, 

the  set  Z  of  integers,  the 

class  0  of  all  ordinals 

map  restriction,  domain,  range, 
one-one. 


Behmann 


Tuples 


union,  intersection,  subset, 
#»  +»  ->  <=»  quantification 
nver  sets. 

[],  [y],  II,  tl(i:j),  range,  # 
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(5)  Blobbing 

Sometimes  if  a  formula  contains  symbols  which  are  not  part 
of  a  decidable  sublanguage,  a  decision  algorithm  can 
nevertheless  be  applied  if  we  replace  certain  terms  of  the 
formula  by  variables.  This  technique  is  called  blobbing 
[Schw79b] .   For  example,  consider  the  formula: 

X  in  ({y  in  s  I  y  ~subset  t}  +  {z  in  si  |  z  =  f(z)}) 
<-> 
X  in  {y  in  s  1  y  "subset  t}  or  x  in  {z  in  si  |  z  =  f(z)} 

This  formula  is  not  directly  decidable  since  it  contains  general 

set   former  expressions.   However,   the   formula  obtained  by 

blobbing  the  set  formers  is  decidable: 

X  in  (A  +  B)  <->  X  in  A  or  X  in  B. 

(6)  Reductions 

Reductions  are  rules  for  replacing  expressions  containing 
set  formers  by  expressions  with  fewer  or  simpler  set  formers.  The 
following  examples  illustrate  some  of  the  more  important  rules 
of  this   sort. 


X   in   {Y    I    FM} 

>> 
FM(Y    \    X) 


X    in    {EXP    :    Y    I    FM} 
>> 

E   Y    i    FM      &    X   =   EXP 


{X    I    Fl^}   +    {Y    I    FtU} 
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>> 

{X    I    FM   or    Ftll(Y    \    X) } 


{EXP  :  X  I  FM}  *  {EXPl  :  Y  |  FMl} 

>> 
{EXP  :  X  I  FM  &  (E  Y  I  FMl  &  EXPl  =  EXP)} 


{EXPl  :  X  in  {EXP2  :  Y  in  S  |  FMl}  |  FM2} 

>> 
{EXPKX  \  EXP2)  :  Y  in  S  |  F41  &  FM2(X  \  EXP2)} 

The  following  reduction  rules  involve  maps,  and  are  particularly 

important  for  program  verification. 


{[X,  EXP]  :  X  in  S}  (Y) 

>> 
if  Y  in  S  then  EXP(X  \  Y)  else  {[X,  EXP]  :  X  in  S}  end 


({[EXPl,  EXP2]}  +  {[X,  Y]  in  S  |  X  ~=  EXPl})  (EXP3) 

>> 
if  EXPl  =  EXP3  then  EXP2  else  S(EXP3)   end 


({ [EXPl,  Y]  :  Y  in  EXP2}  + 

{[X,  Y]  in  S  I  X  ~=  EXPl})  {EXP3} 
>> 
if  EXP3  =  EXPl  then  EXP2  else  S{EXP3}  end 

The  last  two  rules  correspond  to  the  array  rule  of  the   Stanford 

PASCAL  Verifier. 

Examples  of  rules  involving  compound  operators  include   the 
following: 


X  =  +  /  EXP 
>> 

A  Y  in  X  i  E  Z  in  EXP  I  Y  in  Z 

& 
A  Y  in  EXP  I  Y  subset  X 


X  =  max  /  EXP 
>> 

X  in  EXP   &   A  Y  in  EXP  I  X  >=  Y 
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As  a  simple  example  of  the  use   of  these  rules,   let   us 

consider   the  verification   condition  (VC4)  in  section  3.2.   We 
assume  that  we  have  the  following  theorem: 


(1)      sorted(a,    left,    right)      -> 

(A   left    <=  X   <=   right,    left    <=  y   <=  x    | 
a(x)    >=  a(y)    ) 


then  we  assume 

(2)  sorted(a,  left,  right) 

(3)  right  >  left 

(4)  inew  in  {left  <=  k.  <=  right} 

(5)  a(inew)  >  b 

(6)  jnew  <->  (E  left  <=  x  <=  inew  -  1  |  a(x)  =  b) 
Suppose  first  that 

(7)  E  left  <=  X  <=  inew  -  1  |  a(x)  =  b 
We  instantiate  (6)  by  E-elimination: 

(8)  left  <=  xO  &  xO  <=  inew-1  &  a(xO)  =  b 
By  reduction  of  (4),  we  obtain: 

(9)  left  <=  inew  &  inew  <=  right 

By  applying  a  decision  prodecure  using  (8)  and  (9)  we  can  show: 

(10)  left    <=  xO   &   xO   <=   right 

Finally,  by  E-introduction,  from  (10)  and  (8)  we  obtain: 

(11)  E  left  <=  X  <=  right  |  a(x)  =  b. 
We  have  therefore  shown  that: 

(3)  &  (4)   &  (7)   ->  (10) 
Conversely,  instead  of  (7),  suppose 

(12)  A  left  <=  X  <=  inew  -  1  |  a(x)  ~=  b 
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We  assume 

(13)  E  left  <=  X  <=  right  |  a(x)  =  b 

and  will  derive  a  contradiction. 
Specifically,  istantiating  (13),  we  obtain: 

(14)  left  <=  xO  &  xO  <=  right  &  a(xO)  =  b 
We  then  instantiate  (12)  using  xO: 

(15)  left  <=  xO   &   xO  <=  inew  -1  ->  a(xO)  "=   b 
From  (9),  (lA),  and  (15),  we  can  show: 

(16)  xO  >=  inew. 

Next,  we  instantiate  (1)  using  xO  and  inew  to  obtain: 

(17)  left  <=  xO  &  xO  <=  right   &  left  <=  inew   & 
'.                                             inew  <=  xO    ->    a(inew)  <=  a(xO) 

Then  from  (5),  (17),  (16),  and  (14)  we  obtain  a  contradiction. 

We  have  thereby  shown  that 

(1)  &  (3)  &  (4)  &  (5)  &  (12) 

-> 
"(E  left  <=  X  <=  right  |  a(x)  =  b) 

which   completes    the  proof  of    (VC4).      Of     course,      more      powerful 

semi-automatic     proof     verifiers      can   reduce    the   number   of  steps 

apparent  in  the  preceding  discussion. 

We  will  give   more   extensive   examples   of   this   sort   of 
verification  in  section  5.2.8. 


CHAPTER  4 
Transformations  * 


In  this  chapter  we  will  describe  a  set  of  basic 
transformational  capabilities  which  would  be  built  into  our 
proposed  system.  In  section  4.2  we  will  give  two  examples  of 
algorithm  derivation  using  these  transformations.  We  choose  to 
begin  with  an  overview  of  the  way  these  transformations  are  to 
be  applied. 

The  function  of  our  transformation/verification  system  is 
to  execute  transformation  commands  while  preventing  the  user 
from  introducing  errors.  Of  course,  most  of  the  creative 
aspects  of  a  program  proof  or  derivation  will  have  to  be 
supplied  by  the  user.  The  system  can  be  regarded  as  a 
semantically  knowledgable  editor  which  allows  program  changes 
only  if  the  resulting  program  remains  correct.  As  explained  in 
the  next  section,  the  basic  set  of  transformation  commands 
include  substitute,  insert,  delete,  and  move,  which  resemble 
basic  editor  commands. 

A  main  feature  of  the  system  is  that   transformations  will 
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generally  be  allowed  without  proof  being  demanded  immediately. 
If  the  system  cannot  detect  that  the  transformation  is  correct, 
t-hen  new  assumptions  are  introduced  into  the  text  to  guarantee 
correctness  of  the  resulting  praa.  In  simple  cases,  the  system 
will  automatically  verify  the  enabling  assumptions.  Other.^ise, 
it  then  becomes  the  user's  responsibility  to  remove  the 
additional  assumptions  by  supplying  their  proofs  either  manually 
or  semi-automat ically. 

Often  intended  code  transformations  cannot  be  performed  in 
one  step.  That  is,  the  user  may  specify  a  transformation  which 
he  hopes  will  give  a  praa  R2  from  Rl,  but  the  system  may  return 
a  praa  Rl'  which  is  intermediate  between  Rl  and  R2.  As  we  shall 
see,  Rl'  will  typically  contain  new  temporary  variables  and 
assumptions  involving  these  variables.  The  new  variables  denote 
the  original  values  of  variables  which  are  modified  by  the 
transformation.  The  assumptions  express  a  relationship  between 
the  new  and  old  values  of  these  variables.  Verification  of 
these  assumptions  will  then  be  necessary  before  Rl'  can  be 
transformed  into  R2. 

An  interactive  transformation  session  consists  of  a 
sequence  of  transformation  steps  x/hereby  the  user  verifies  or 
transforms  a  program  (or  both).  Verification  and  transformation 
steps  may  be  freely  interspersed  during  the  course  of  a  session. 
A  verification  step  is  simply  a  transformation  which  only 
modifies   the   assumptions   and   assertions  of  a  praa,  including 
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comir.ands  to  the  proof  checker. 

Certain  important  subsequences  of  transformations  occur 
frequently  in  praa  derivations;  these  may,  for  example,  realize 
commonly  occurring  optimization  and  data  structure  choice 
techniques.  We  expect  the  usefulness  of  the  system  to  depend 
strongly  on  the  ease  with  which  transformations  embodying 
generally  known  and  useful  techniques  can  be  invoked.  We 
therefore  propose  to  implement  such  subsequences  as  single 
transformations,  which  we  call  derived  transformations.  The 
library  of  derived  transformations  will  be  extendable;  users 
should  be  able  to  add  useful  new  transformations  as  experience 
with  the  system  dictates. 

4.1  Program  Transformations 

We  now  describe  a  collection  of  useful  transformations, 
beginning  with  substitution,  insertion,  and  deletion. 

4.1.1  Substitution 

The  basic  substitution  rule  states  that  an  expression  or 
formula  can  be  replaced  by  another  expression  or  formula 
provided  that  they  yield  the  same  value.  A  substitution 
transformation  introduces  an  assumption  into  the  program  text, 
the  precise  form  of  which  depends  on  the  forms  of  the 
expressions. 
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We   first   assume   that   the   expressions   do  not   contain 
function  calls. 


(1)  Expression  Substitution 

X  :=  EXPl;     =>       |=  EXPl  =  EXP2 

X  :=  EXP2; 

(2)  Formula  Substitution 

1=  FMl  <->  FTI2 
if  FMl  then  ...    =>     if  FM2  then... 


The  next  set  of  rules  are  important  special   cases   of   (1) 
and  (2). 

(3)   Selection  Substitution 

X  :=  arb  EXPl;      =>      |=  EXP2  in  EXPl 

X  :=  EXP2; 

As  special  cases  of  (3),  if  EXPl  is  a  set  of  integers,  then   the 

maximum  or  minimum  element  may  be  selected  from  the  set.   This 

eives: 

(4)   Maximum  Selection 

X  :=  arb  EXP;        =>      |=  EXP  :  set(int) 

X  :=  max  /  EXP; 

and 

(5)  Minimum  Selection 

X  :=  arb  EXP;       =>       |=  EXP  :  set(int) 

X  :=  min  /  EXP; 

(6)  Selection  Refinement 

X  :=  arb  EXPl;      =>      |=  EXP2  subset  EXPl 

X  :=  arb  EXP2; 

(Note  that  this  transformation  allows  us  to  replace  EXP2  by   the 
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null  set  since  the  null  set  is  a  subset  of  any  set  EXPl.  If 
this  is  done  the  resulting  program  will  not  terminate,  so  that 
partial  correctness  is  preserved.) 

The  following   rule   embodies   a  useful   special   case   of 
selection  refinement. 

(7)     X  :=  arb  {EXP  :  QEl,...,QEn  |  FM};     => 
X  :=  arb  <EXP  :  QEl,...,QEn  |  FM  &  FMl}; 

Next  we  consider  expressions  which  contain  function   calls. 
Suppose  F  is  a  function,  declared  by  the  statement 

proc  F(Yl,...,Yk)  returns  Zl,...,Zn; 
and  suppose  Ftll  is  the  input  assumption  of  F  and   FM2   is   the 
output  assertion.   Then  the  two  following  rules  apply. 

(8)  Replacement  of  an  Expression  by  a  Function  Call 

[XI,...,  Xn]   :=  EXP; 

=> 

1=  FMKYl  \  EXPl,  ...,  Yk  \  EXPk)   & 
FM2(Y1  \  EXPl,  ...,  Yk  \  EXPk) 
-> 

[Zl,... ,Zn]  =  EXP 

[XI,...,  Xn]  :=  FCEXPl,.. . ,EXPn); 

(8a) 

[XI,.. .,  Xn]   :=  arb  EXP; 

=> 

1=  FMKYl  \  EXPl,  ...,  Yk  \  EXPk)   & 
FM2(Y1  \  EXPl,  ...,  Yk  \  EXPk) 

-> 
[Zl,...,Zn]  in  EXP 
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[Xl,...,Xn]  :=  F(EXPl,...,EXPk); 

(9)  Function  Call  Replacement 
To  replace  a   function   call  by   an  expression,   it   is   first 
necessary   to   show   that   the  function  terminates  (see  Deletion 
Rule.)   Once   termination   is   verified,   we   can  perform   the 
following  transformation: 

[XI,..., Xn]  :=  F(EXPl,...,EXPk); 
=> 

1=  FM1(Y1\EXP1,... ,Yk  \  EXPk)   -> 

FM2(YI  \  EXPl,...,Yk  \  EXPk,  Zl  \  EXP(l), 

... ,  Zn  \  EXP(n)) 

[XI,..., Xn]  :=  EXP; 

Substitution  rules  for  other  language  dictions   can   readily  be 

obtained  from  these  rules  and  the  compilation  transformations. 

4.1.2  Insertion  Transformations 

Insertion  transformations  allow  blocks  of  code,  possibly 
annotated,  to  be  inserted  into  a  praa  R.  To  prepare  for  the 
description  of  these  rules,  we  first  give  the  following 
definitions. 

A  variable  X  is  said  to  be  dead  at  a  place  (3  if  there  is  a 
redefinition  of  X  along  all  execution  paths  from  @  to  a  use  of  X 
in  an  expression,  assumption,  or  assertion.  Conversely,  if 
•"here   occurs  some  use  of  X  along  an  execution  path  from  (?  which 
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is  not  preceded  by  any  redefinition,  then  the  use  is  said  to  be 
chained  to  the  place  (?,  and  the  variable  X  is  said  to  be  live  at 

(a. 

Suppose  SBl,...,SBn  are  blocks  of  code  which  are  to  be 
inserted  before  places  @l,...,'an  of  R.  The  following  conditions 
are  sufficient  for  preservation  of  correctness. 

For  all  SBi,  1  <=  i  <=  n, 

(i)  SBi  is  partially  correct. 

(ii)  All  variables  modified  in  SBi  are  dead  at  @i  in  R. 

(iii)  SBi  is  single  entry/single  exit;  that  is,  all  labels 
referenced  in  SBi  are  defined  in  SBi. 

(iv)  If  a  label  L  is  defined  in  SBi,  then  it  is  not  defined 
in  R  or  SBj ,  1  <=  j  <=  n,  j  ~=  i- 

The  enabling  conditions  (ii)  -  (iv)  are  simple  enough  to  be 
checked  automatically  before  the  transformation  is  performed. 
Therefore,  no  additional  assumptions  are  introduced  into  the 
text  by  the  insertion  transformation. 

4.1.3  Deletion  Transformations 
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The  deletion  transformation  allows  blocks  of  code  to  be 
deleted  from  a  praa.  Deletion  is  the  dual  of  insertion,  and  the 
enabling  conditions  for  these  two  transformations  are  similar. 
However,  some  extra  care  must  be  taken  in  the  case  of  deletion, 
as  the  following  example  shows.  Suppose  X  is  a  dead  variable, 
so  we  can  introduce  the  following  code  into  a  praa  using  the 
insertion  rule. 

(1)  (while  3  <  5) 

X  :=  X  +  1; 
end; 

The  assertion 

(2)  I-  3  >=  5 

will  then  become  available  after  (1),  but  since  (1)  does  not 
terminate,  partial  correctness  is  maintained.  From  (2)  any 
proposition  can  be  proven.  Suppose  now  that  we  then  try  to 
delete  (1),  using  the  enabling  condition  that  X  is  dead  after 
(1).  Then  the  praa  will  again  terminate,  and  it  will  no  longer 
be  partially  correct,  since  it  contains  false  assertions.  We 
therefore  must  include  the  additional  restriction  that  in  order 
to  delete  a  block  of  code  SB,  false  must  not  be  available  after 
SB.  Note  that  any  attempt  to  extract  an  element  from  the  null 
set  also  results  in  a  false  verification  hypothesis.  Therefore, 
before  deleting  a  statement  of  the  form 

X  :=  arb  EXP; 
we  must  verify  that  EXP  is  not  the  empty  set.   Overall,  the  rule 
for  deletion  validity  is  as  follows. 
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Suppose  SBl,...,  SBn  are  blocks  of  code  contained  in  a  praa 
R  at  places  after  f?l,...,  @n  respectively.  Before  deleting 
these  blocks,  we  must  show  that  each  block  SBi  terminates.  For 
this,  let  R'  be  the  praa  obtained  by  deleting  SBi,  1  <=  i  <  n, 
from  R.   If 

(i)  SBi  contains  no  assumptions. 

(ii)  All  variables  modified  in  SBi  are  dead  at  (?i  in  R'. 

(iii)  SBi  is  single  entry/single  exit. 

then  the  deletion  preserves  correctness. 

4.1.4  Data  Structure  Transformations 

Data  structure  transformations  can  be  achieved  by  a 
sequence  of  insert,  substitute,  and  delete  transformations  as 
follows.  Suppose  that  a  data  structure  is  represented  by 
variables  x  and  y,  and  that  we  wish  to  pass  to  an  alternate 
representation  which  uses  new  variables  a  and  b.  Since  a  and  b 
are  initially  dead,  we  can  introduce  assignments  to  a  and  b  by 
the  insertion  transformation.  We  can  then  replace  uses  of  the 
variables  x  and  y  by  uses  of  the  variables  a  and  b.  This  step 
generally  requires  verification;  that  Is,  we  must  generally 
prove  that  an  assertion 

P(x,  y,  a,  b) 
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holds  and  as  such  that  for  all  expressions  g(x,  y)   occurlng   in 
the  praa, 

E  f  I  P(x,  y,  a,  b)  ->  g(x,  y)  =  f(a,  b), 
where  f  is  some  auxiliary  expression.   We   can   then  substitute 
f(a,  b)  for  g(x,  y).   This  transformation  will  eliminate  all  old 
variables  x  and  y  and  we  can  then  delete  assignments  to  x  and  y, 
thus  completing  the  data  structure  transformation. 

Derived  data  structure  transformation  rules  are  single 
transformations  which  accomplish  the  sequence  of  steps  described 
above,  and  which  embody  well  known  data  structure 
implementations.   We  give  two  simple  examples  below. 

(A)  Representation  of  a  workpile  by  a  stack 

Suppose  that  a  set  s  is  used  as  a  workpile.  That  is,  all 
modifications  to  s  in  a  praa  R  are  of  the  form: 

(1)  I-  f  :  bmap 

s  :=  {f(x)  :  X  I  q(x)}; 

(2)  |-  z  ~in  s 

s  :=  s  +  {z }; 

(Disjoint  set  union.) 

(3)  s  :=  s  -  {y}; 

and  all  uses  of  s  in  the  praa  R  are  of  the  form 

(4)  y  :=  arb  s; 
and 

(5)  while  s  -=  {} 
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Then  we  can  represent  s  easily   by   a   stack   t   such   that   the 
assertion 

|-  t  :  tuple  &  range  t  =  s 
is  maintained.   We  replace  (1)  by 
t  :=  [f(x)  :  X  I  q(x)] ; 

(2)  by 

t  :=  t  I  I  [z]; 

(3)  by 

1=  t(#t)  =  y 
t  :=  t(l:#t-l); 

(4)  by 

y  :=  t(#t); 
and  (5)  by 

(while  t  ~=  []). 

(B)  Representation  of  a  graph  by  an  adjacency  list 

Suppose  that  v  is  a  set  of  graph  vertices  and  e  is  a  set  of 
eraph  edges,  such  that  the  following  assertion  is  available: 

|-v={l<=i<=k}   &  e:  map(v)  v 
For  various  graph   searching   applications,   an  adjacency   list 
representation   is   suitable.     In  particular,   suppose   all 
modifications  to  e  are  of  the  form: 


(7)   |-  X  in  v  &  y  in  V 
e  :=  e  +  { [x,  y] }; 

and  all  uses  of  e  are  of  the  form: 
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(8)   (A  y  in  e{x}) 

end; 

Then  we  can  introduce  new  variab'les  nodes  and  next  such  that  the 
following  assertion  is  maintained: 


(8)   |-  nodes  :  tuple  &  next  :  tuple  & 

e  =  {[x,  y]  :  I  <=  x  <=  k,  1  <=  y  <=  k  | 

next(x)  ""=  0  & 
E  t  I  t: tuple  &  t(l)  =  nodes (next (x) )  & 
t(#t)  =  y   & 
&  A  I  <=  j  <  //  t  I  t(j  +  l)  =  nodes(next(t(j)))} 

After  this  we  replace  (7)  by 


nodes    :=  nodes    ||     [y] ; 
next    :=   next    ||     [next(x)]; 
next(x)    :=    //nodes; 

and   replace  (8)  by 

n  : =  next (x) ; 
(while  n  "■=  0) 
y  :=  nodes (n) ; 


n  : =  next (n) ; 
end; 


A  more   complex   data   structure   transformation   for   the 
union-find  problem  is  presented  in  section  4.2.1.3. 


The  next  group  of  rules,  block  substitution,  assignment 
insertion/deletion,  and  code  motion,  can  be  derived  from  the 
rules  already  stated;  that  is,  their  effect  can  be  obtained  by 
applying   sequences   of   insertion,   deletion,   and  substitution 
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transformations.   However,   they   are   important   enough   to  be 
treated  separately. 

4.1.5  Block  Substitution 

Block  substitution  is  a  transformation  which  replaces  a 
single  assignment  statement  by  a  block  of  code.  Often,  the 
replaced  statement  will  contain  a  set  former  expression  from 
which  a  selection  is  made  and  we  will  use  block  substitution  to 
replace  this  selection  statement  by  code  which  computes  an 
element  that  might  have  been  selected.  Block  substitution  may 
also  be  used  to  replace  an  expression  by  an  explicit 
computation. 

We  first  state  a  simple  statement  removal  rule,  from  which 
the  general  block  substitution  rule  can  be  derived. 

(1)  Null  Block  Substitution 

[Xl,...,Xn]   :=  EXP;     =>    |=   [Xl,...,Xn]  =  EXP 
and 

[Xl,...,Xn]   :=  arb  EXP;   =>   |=   [Xl,...,Xn]  in  EXP 
(If  EXP  is  a  function  call,   then  we  generate   the  assumption 
described  in  section  4.1.1  (9).) 

We  now  state  a  more  general  block  substitution  rule. 
Suppose  we  wish  to  replace  a  statement  at  ?  of  the  form 
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(2)  [XI,..., Xn]  :=  EXP; 
or 

(3)  [XI,..., Xn]  :=  arb  EXP; 

by  a  code  block  SB,  where  Xil,...,Xik  are  the  variables  from 
XI,..., Xn  which  are  used  in  EXP.  Let  Xil' , . . . ,Xik'  be  new 
variables  not  used  in  SB  or  R.  The  block  substitution  for  (2) 
can  be  achieved  by  (i)  inserting  the  assignments 

Xil'  :=  Xil; 

Xik'  :=  Xik; 
before  (2)  ;   (ii)  changing  uses  of  Xik  in  (2)  to  uses  of  Xik', 
(iii)  inserting  the  code  block  SB  before  (2)  using  the  insertion 
rule,  and  finally  (iv)  deleting  (2)  by  the  rule  (1).    Combining 
these  steps,  we  obtain  the  following  rule. 

(4)  [XI,..., Xn]  :=  EXP; 

=> 
Xil'  :=  Xil; 

Xik'  :=  Xik; 
SB 
1=   [XI,..., Xn]  =  EXP(Xil\Xil',...,Xik\Xik') 

and 

(5)  [Xl,...,Xn]  :=  arb  EXP; 

=> 
Xil'  :=  Xil; 


Xik'  :=  Xik; 
SB 
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1=  [Xl,...,Xn]  in  EXP(Xil\Xil' , . . . ,Xik\Xik' ) 

provided  that: 

(i)  SB  is  partially  correct 

(ii)  For  all  variables  X  modified  in  SB,  either  X  =  Xi   for 
1  <=  i  <=  n  or  X  is  dead  at  @. 

(iii)  SB  is  single  entry/single  exit. 
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The  next  set  of  transformations,  called  assignment 
insertion,  assignment  deletion,  and  code  motion,  are  designed  to 
facilitate  the  manipulation  of  maps  and  tuples.  In  particular, 
these  transformations  are  useful  for  storage  optimization  and 
loop  jamming. 

A. 1.6  Assignment  Insertion 

According  to  the  basic  insertion  rule  described  above,  an 
assignment  statement  S  can  be  inserted  at  a  place  @  only  if  the 
variable  X  modified  by  S  is  dead  at  Td.  However,  this  condition 
is  sometimes  too  restrictive;  that  is,  even  if  X  is  not  dead  at 
@,  the  insertion  of  an  assignment  to  X  can  be  correctness 
preserving  in  a  few  useful  cases.  In  particular,  if  X  is  not 
dead  at  (?,  but  the  insertion  of  the  assignment  S  has  no  effect 
on  the  computation  states  at  subsequent  uses  of  X,  then  the 
transformation  is  correctness  preserving.  For  example,  there 
may  be  a  use  of  X  chained  to  @  at  a  place  @'  which  is 
nevertheless  unreachable  from  @  because  the  execution  path  from 
@  to  @'  will  never  actually  be  taken.  Moreover,  we  will  often 
want  to  insert  indexed  assignments  to  variables  which  are  not 
dead  at  the  point  of  insertion.  This  is  allowed  if  the 
assignment  has  the  form: 

X(EXPl)   :=  EXP; 
and  if,  for  example,  the  range  of  EXPl  and  the  range  of  EXP2  are 
disjoint   for   all   uses  X(EXP2)  chained  to  the  assignment.   The 
following  rule  includes  this  case,  and  makes   precise  what   we 
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mean  by  an  assignment  having  "no  effect"  on  an  upwardly  exposed 
use: 

Suppose  S  is  an  assignment  of  the  form 
(1)  X  :=  EXP; 
to  be  Inserted  at  a  place  (?.  Let  @I,...,@r  be  the  places  where 
there  are  uses  of  X  (in  expressions  EXPi,  1  <=  i  <=  r)  which  are 
chained  to  the  place  @.  Suppose  that  we  generate  a  new  shadow 
variable  X'  to  store  the  original  value  of  X.  (  X'  is  intended 
as  a  temporary  variable  which  will  be  deleted  after  the 
justifying  assumptions  to  be  stated  below  have  been  verified.) 
For  each  use  of  X  at  @i,  1  <=  i  <=  r,  let  (3il , .  •  •  j^is  be  the 
definitions  which  are  chained  to  the  use  of  X  at  @i-  After  each 
statement  @i j ,  1  <=  j  <=  s ,  I  <=  i  <=  r,  we  insert  an  assignment 
of  the  form 

(2)     X'  :=   X; 
to  initialize  X'.   We  also  insert  assignment  (1)  before   (?,   and 
insert  assumptions 

1=   EXP   =  EXP(X\X') 
before  places  @l,...,(§r.   These  assumptions  assert  that  the  uses 
of  X  are  not  affected  by  the  insertion  of  (1). 

To  illustrate  this  rule,  consider  the  code  fragment: 


(3)    @0      a  :=  s  +  t; 
b  :=  {}; 


(forall  1  <=  X  <=  nl) 
(31  b(x)  :=  a(x)  +  c(x); 
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@        end; 

(forall  nl  <  X  <=  n2) 
(§2  b(x)  :=  a(x)  +  d(x); 

end; 

(Note   that   the   ranges   of  the   two   loops   shown  above  are 

disjoint.)  Suppose  we  wish  to  insert  the  statement 

(4)  a(x)   :=  a(x)  +  1; 

before  the  place  (3.  The  variable  a  is  not   dead  at   @,   since 

there   are  uses  at  @1  and  (32  chained  to  @.   However,  the  use  at 

(32  is  not  affected  by  the  insertion  since  the  range  of  x  at   (?2 

is  disjoint   from  the  range  of  x  at  (?.   Similarly,  the  use  of  a 

at  (?1  is  not  affected  since  x  is  increasing   in   the   loop.    In 

this   example,  if  we  use  the  assignment  insertion  transformation 

just  described,  the  system  will  introduce  the  new  variable   a', 

and  form  the  praa  R': 


(5)   (30      a  :=  s  +  t; 
(311     a'  :=  a; 
b  :=  {}; 


(forall  1  <=  x  <=  nl) 

1=  a(x)  +  c(x)  =  a'(x)  +  c(x) 

@1        b(x)  :=  a(x)  +  c(x) ; 

a(x)  :=   a(x)  +  1 ; 
@      end; 

(forall  1  <  X  <=  n2) 

1=  a(x)  +  d(x)  =  a'(x)  +  d(x) 

@2        b(x)  :=  a(x)  +  d(x); 
end; 

The  new  variable  a'    stores  the  original  value  of  a.    As   shown, 

assumptions  will   also  have  been  introduced  before  @1  and  (^2   to 
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preserve  correctness.   Of  these,  the  assumption  before  @1  can  be 
verified  by  adding  the  loop  invariant 

I-  A  X  <=  i  <=  nl  I  a'(i)  =  a(i) 
and  the  assumption  before  @2  can  be  verified  by  adding  the   loop 
invariant 

I-  A  nl  <  i  <=  n2  I  a'(i)  =  a(i) 
Once  these  assumptions  have  been  verified,  a'  is  dead,   and   can 
be  eliminated.   This  gives  us  the  code  fragment: 


(6)       a  :=  s  +  t; 
b  :=  {}; 


(forall  1  <=  X  <=  nl) 

@1         b(x)  :=  a(x)  +  c(x) ; 

a(x)  :=  a(x)  +  1 ; 
end; 

(forall  nl  <  X  <=  n2) 
@2          b(x)  :=  a(x)  +  d(x); 
end; 


There  is  another  way  of  obtaining  this  result.  Instead  of 
applying  our  assignment  insertion  rule,  we  could  introduce  a  new 
variable  al,  insert  the  assignment 

al(x)   :=  al(x)  +  1; 
before  (?,  then  use  equality  substitution  to  change  all  uses  of  a 
to  uses   of  al,   and   finally   eliminate  a  as  a  dead  variable. 
However,  use  of   the  assignment   insertion   transformation  has 
several  advantages  over  this  method: 

(a)  The  variable  a  retains   its   original   name,   which   is 
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especially  desirable  if  a  is  used  elsewhere  in  the  program. 

(b)  The  assignment  insertion  transformation  combines 
several  small  steps  into  a  single  step- 

4.1.7  Assignment  Deletion 

The  assignment  deletion  transformation  is  the  dual  of 
assignment  insertion.  It  enables  us  to  delete  a  statement  which 
modifies  a  variable  X  which  is  not  dead.  The  precise  rule  is  as 
follows. 

Suppose  an  assignment  S  of  the  form 
(1)  X  :=  EXP; 
is  to  be  deleted  from  a  place  @.  Let  @l,...,(?r  be  the  places 
where  there  are  uses  of  X  chained  to  @  and  suppose  that  these 
uses  occur  in  expressions  EXPi,  1  <=  i  <=  r.  (Note  that  there 
may  be  an  upwardly  exposed  use  of  X  in  statement  (1)  itself.) 
Generate  a  new  temporary  shadow  variable  X',  to  store  the  value 
which  X  before  the  transformation  is  applied.  For  each  use  @i, 
1  <=  i  <=  r,  let  @i j ,  1  <=  j  <=  s  be  the  definitions  of  X  which 
reach  the  use  of  X  at  @i.  After  each  statement  @i j ,  1  <=  j  <= 
s,  1  <=  i  <=  r,  (3ij  "■=  @,  insert  the  statement  statement 

(2)  X'  :=  X; 

to  initialize  X'.   Then  insert  an  assumption  of  the  form 

(3)  1=  EXPi  =  EXP(X\Xi) 

before  places  @i,  1  <=  i  <=  r,  @i  ~=  @.   This  assumption  asserts 
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that   the  value   of  EXPi  is  not  changed  by  the  transformation. 
Next,  replace  the  assignment  (1)  by  the  assignment 
(4)        X'  :=  EXP(X\X'); 

Once  the  assumptions  (3)  are  verified,  statements  (2)  and 
(4)  can  be  deleted  as  dead  code. 

4.1.8  Code  Motion 

In  this  section  we  state  a  general  rule  for  moving 
assignment  statements.  We  assume  that  the  user  has  specified 
t^hat  a  statement  of  the  form 

(1)  X  :=  EXP; 

at  place  @  is  to  be  moved  to  program  place  @'.  In  several 
important  special  cases  it  is  possible  to  determine  the 
correctness  of  this  transformation  without  theorem  proving.  For 
example,  if  statement  (1)  is  a  loop  invariant,  it  can  be  moved 
out  of  the  loop.  (However,  since  it  is  possible  for  the  loop 
condition  to  be  false  Initially,  an  additional  condition  that  X 
should  be  dead  at  the  exit  place  of  the  loop  is  required  to 
ensure  correctness.) 

We  distinqush  various  special  cases  to  ease  the  treatment 
of  certain  situations  in  which  assignment  motion  clearly 
oreserves  correctness. 
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Code  Motion  -  Case  I 
Moving  statement  (1)  from  @  to  @'  preserves  correctness  if 

(i)  The  place  @'  predominates  the  place  @. 

(ii)  All  paths  from  @'  to  (?  are  X-clear. 

(iii)  No  variable  occurring  in  EXP  is  modified  along  a  path 
from  @'  to  (?. 

(iv)  X  is  dead  at  @'. 

We  note  that  this  rule  incorporates  a  simple  statement 
interchange  transformation  which  is  implemented  in  several 
source-to-source  transformation  systems. 

The  above  rule  allows  code  motion  in  the  backwards  flow 
direction,  and  there  is  a  similar  rule  for  code  motion  in  the 
forward  flow  direction. 

Code  Motion  -  Case  II 

Moving  statement  (1)  preserves  correctness  if 
(i)  The  place  0'  backdorainates  the  place  @. 
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(ii)  The  place  (?  predominates  the  place  @'. 

(iii)  All  paths  from  @  to  @'  are  X-clear. 

(iv)  No  variable  occurring  in  EXP  is  modified  along  a  path 
from  @'  to  @. 

To  move  code  in  other  cases  will  generally  involve  more 
complex  logical  conditions,  and  hence  will  introduce  assumptions 
into  the  program  which  the  user  must  subsequently  verify.  For 
example,  the  user  might  move  a  loop  invariant  statement  of  the 
form  (1)  out  of  a  loop  to  a  place  where  X  is  not  dead.  The  user 
might  know  that  the  loop  will  always  be  executed,  but  then  this 
fact  must  be  verified.  To  handle  this  case  properly,  our  system 
must  find  all  of  the  uses  of  X  in  the  program  which  might  be 
affected  by  the  transformation,  and  ask  the  user  to  prove 
assumptions  showing  that  these  uses  are  not  affected  by  the 
transformation.  We  now  state  a  formal  rule  incorporating  this 
restriction. 

Code  Motion  -  Case  III 

If  neither  of  the  two  preceding  rules  apply,  then  let 
@l,...,(an  be  the  places  where  there  are  uses  of  X  in  expressions 
EXPi  which  are  chained  to  either  (?  or  (?'.  (It  must  then  be 
shown  that  none  of  tliese  uses  are  spoiled  by  the 
transformation.)  Introduce   a   new  variable   Xi   to   store   the 
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original  value  of  X.  Let  (3il, . . .  ,(3iin,  1  <=  i  <=  n,  be  the 
places  where  there  occurs  a  definition  of  X  which  reaches  @i. 
Insert  the  statement 

X'  :=  X; 
after  each  (?i j ,  1  <=  i  <=  n,  1  <=  j  <=  m,   @ij   ~=  @.    Insert 
assumptions  of  the  form 

1=  EXPi  =  EXPi(X\X') 
before  each  place  @i.   Then  insert  statement  (1)  at  (?',  and  at  @ 
change  statement  (1)  to 

X'  :=  EXP(X\X'); 

Subsequently,  once  the  enabling  assumptions  are  verified, 
X'  can  be  eliminated  by  the  basic  deletion  rule. 

Note  that  this  rule  combines  assignment  insertion  and 
assignment  deletion. 

A. 1.9  Input  Variable  Substitution 

A  variable  which  is  used  in  a  praa  but  never  modified  is 
called  an  input  variable.  Input  variables  serve  the  role  of 
parameters  of  praas.  If  we  wish  to  adapt  the  praa  to  a 
particular  context,  or  incorporate  it  into  another  praa,  we  can 
use  the  rule  of  input  assertion  substitution  to  replace  each 
input  variable  by  an  appropriate  expression. 

Typically,  the  input  assumption  FM  of  a   praa   specifies 
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certain  properties  of  each  input  variable  X;  that  is,  each  such 
variable  will  occur  as  a  free  variable  of  FM.  To  adapt  a  praa 
for  use  in  some  wider  context  by  substituting  for  X  in  the  body 
of  the  praa  we  must  also  make  the  corresponding  substitution  for 
X  in  FM.  More  precisely:  An  input  variable  X  of  a  praa  can  be 
replaced  by  an  expression  EXP,  and  this  will  preserve 
correctness  provided  that  all  free  variables  of  EXP  are  also 
input  variables. 

This  rile  can  be  derived  from  the  Variable  Dropout  Lemma 
[DS77]  which  allows  free  variables  of  input  assumptions  to  be 
auantified  existentially  if  they  are  not  used  anywhere  else  in 
t"he  praa.  To  achieve  this  derivation,  we  would  first  strengthen 
the  input  assumption  by  adding  the  conjunct. 

(1)         1=  X  =  EXP 
The  proposition  (1)  would  then  be   available   everywhere,   since 
♦"here  are  no  modifications  to  the  variables  of  EXP.   We  can  then 
use  substitution  to  replace  all  occurrences  of  X  by   EXP,   and 
delete  assumption  (1)  using  the  variable  dropout  lemma. 

4.1.10  Label  Rules 

In  this  section,  we  specify  rules  which  allow  control  flow 
to  be  manipulated.  The  elementary  rules  (1)  -  (8)  stated  below 
have  the  property  that  they  do  not  alter  the  sequence  of 
computation  states  which  a  program  generates. 

(1)     Branch  Statement  Deletion 
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if  FM  then  go  to  L;     =>    |=   ~FM 

(2)  Branch  Statement  Insertion 

empty      =>         |=  "FM 

if  FM  then  go  to  L; 

provided  L  is  defined  in  R. 

(3)  Label  Substitution 

if  FM  then  go  to  L;     =>    |=  FM  ->   FMl 

•...  if  FM  then  go  to  LI; 

L:  if  FMl  then  go  to  LI; 

L:  if  FMl  then  go  to  LI; 

(4)  if  FM  then  go  to  L;     =>     ] =  FM  ->  FMl 

...  if  FM  then  go  to  LI; 


LI:  if  FMl  then  go  to  L; 


(5)   Label  Substitution  II 


T.l:  if  FMl  then  go  to  L; 


if  FM  then  go  to  L;      =>  | =  FM  ->  ~FM1 

...  if  FM  then  go  to  LI; 
L:  if  FMl  then  go  to  L2;  LI: 

L:  if  FMl  then  go  to  L2;  LI: 

(6)  if  FM  then  go  to  L;      =>     | =  FM  ->  "FMl 

...  if  FM  then  go  to  LI; 

LI:  if  FMl  then  go  to  L2 ; 

L:  LI:  if  FMl  then  go  to  L2;  L; 

(7)  Go  to  Splitting 

if  FM  then  go  to  L;     =>    if  FM  &  FMl  then  go  to  L; 

i f  FM  &  "FMl  then  go  to  L; 

(8)  Go  to  Merging 

if  FM  then  go  to  L;      <=>     if  FM  or  FMl  then  go  to  L; 
1 f  FMl  then  go  to  L; 


The  above  rules  (1)  -  (8)  are  quite  low  level,  and  rarely 
used  in  this  form.  We  therefore  present  various  derived  label 
rules  which  can  be  obtained  by  combining  sequences  of  rules 
"resented   In   this  chapter.   These  derived  rules  will  be  useful 
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in  the  case  studies  to  be  presented  in  later  sections. 


(9)   if  FHl  then  go  to  LI;      if  FM2  &  FMl  then  go  to  L2; 
if  FM2  then  go  to  L2;     <=>    if  FMl  then  go  to  LI; 


(10)   if  FM  then  if  FM  then 

SBl  SBl 

else  1=  FMl 

SB2  SB3 

end;  =>               else 

if  FMl  then                         SB2 

SB3  1=  ""FMl 

else  SB4 

SB4  end; 
end; 


(11)  if  FM  then  if  ~FM  then 

SBl  SB2 

else  <=>  else 

SB2  SBl 

end;  end; 

(12)  if  FMl  then  |=  "FMl  ->  Fl'12 

if  FM2  then  if  FM2  then 

SBl  SBl 

else  else 

SB2         =>  SB2 

end;  end; 

else 

SBl 
end; 

(13)  (while  FM)  (while  FM) 

if  FMl  then  (while  FT1  &  FMl) 

SBl  SBl 

else  <=>  end; 

SB2  (while  FM  &  ~  FMl) 

end;  SB2 

end;  end; 

end; 

(14)  (while  FMl)  (while  FMl  ^  FM2) 

if  FM2  then  SBl 

SBl  e^d; 

else  => 

quit  while; 

end  if; 
end; 


(15)  (while  FMl)  (while  FMl) 
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SBl 

if  FM2  then 

SB2  => 


else 

SB3 
end  if; 
pnd; 


(16)   (while  FM) 

SB  => 

end; 


SBl 

if  FM2 

then 

SB2 

\- 

=  -PMl 

quit; 

end  if; 

SB3 

end; 

(   1=  FM 

->  FMl 

while 

FMl 

SBl 

1=  FM2  -> 

FM 

if  ^42  then 

SB 

end; 

end; 

provided  that  SBl  can  be  inserted  before  SB. 

The  following  rules  (17)  -  (19)  are  loop  unrolling  rules. 

(17)  (while  FM)  1=  FM 

SBl  =>  SBl 

end;  I  =  ""FM 

(18)  (while  FM)  1=  FM 

SB  =>  SB 

end;  (while  FM) 

SB 
end; 

(19)  (forall  EXPl  <=  X  <=  EXP2  |  Fti) 

SB 
end; 

=> 

1=  EXPl  <=  EXP2 
X  :=  EXPl; 
if  ra  then 

SB  end; 
(forall  EXPl  <  X  <=  EXP2  |  FM) 

SB 
end; 

The  following  rule  transforms  while  loops  into  forall  loops. 

(20)    (while  exists  EXPl  >=  X  >=  EXP2  |  FM) 
SB 
end; 
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=> 


1=  EXP  I  =  EXP 

(forall  EXP  >=  x  >=  EXP2  |  FM) 

SB 
1=   (A  EXPl  >=  i  >=  X  I  -FH(x   \  i))   & 

(A  X  >=  i  >=  EXPl  I  ~FM(x  \  i)) 
end; 


The  following  rule  embodies  a  simple  loop  fusion  principle. 

(21)   (forall  RANGEXP)        (forall  RANGEXP) 

<5B1  SBl 

end;  <=>  SB2 

(forall  RANGEXP)        end; 

SB2 
end; 

provided  that  the  set  of  variables  modified  in  SBl  so  not  appear 
in  SB2  and  vice-versa. 


A.  1.1 1  Tail  Recursion  Removal 

Recursion  removal  transformations  are  described  by  various 
authors  including  [AS78] ,  [BD75] ,  and  [WS72] .  In  our  case 
studies  we  will  use  the  following  tail  recursion  removal 
transformation: 

Suppose  F  is  a  recursive  function  with  input  parameters 
XI,..., Xk  such  that  a  return  statement  follows  each  recursive 
call.  Then  we  can  remove  the  recursion  by  replacing  all  uses  of 
Xi  by  uses  of  new  variables  Xi',  then  by  inserting  the  code: 

L:   [Xl',...,Xk']  :=  [XI,...,Xk]; 
at  the  entry  place,  and  replacing  each  recursive  call 
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F(EXPl,...,EXPk); 
by 

XI'  :=  EXPl; 


Xk'   :=  EXPk; 
go  to  L; 
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4.2  Examples  of  Algorithm  Derivations 

To  illustrate  the  use  of  the  transformation  rules  listed  in 
the  preceeding  section,  we  shall  now  give  several  examples  of 
algorithm  derivation  from  a  high  level  specification.  Detailed 
nroofs  of  these  algorithms  will  not  be  given;  we  prove  a  more 
complex  case  study  in  Chapter  5. 

4.2.1  Example  1  :   Minimum  Cost  Spanning  Tree 

Tn  this  example,  we  derive  the  greedy  algorithm  from  a  high 
level  algorithm  for  finding  a  minimum  cost  spanning  tree  of  an 
undirected  graph.  Note  that,  although  we  omit  to  do  so,  we 
could  derive  other  minimum  cost  algorithms  from  the  high  level 
algorithm  with  which  we  will  begin;  in  particular,  Dijkstra's 
algorithm  and  an  algorithm  which  uses  a  uniform  selection 
nolicy. 

We  first  present  relevant  definitions  and  a  theorem  from 
which  the  correctness  of  the  high  level  praa  follows. 

4.2.1.1  Definitions 

Given  an  undirected  graph  g  and  a  cost  function  defined  on 

the   edges  of  g,  our  problem  is  to  find  a  subset  of  the  edges  of 

B  which  is  a  spanning  tree  (or  forest  if  g  is  not  connected)  of 

minimum  cost.    We   represent   an  undirected  graph  as  a  set  of 
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unordered  edges,  where  an  unordered  edge  Is  a  set  of  two  nodes. 
We  represent  a  path  as  a  tuple  of  nodes.  The  following  are  our 
definitions  of  unordered  graphs,  paths,  and  trees,  and  of 
various  functions  related  to  these  basic  notions. 


(Dl)   nodes(g)  =  +  /  g 

(D2)   ispath(p,  g)  <->  p  :  tuple(nodes (g) )   & 
(A  1  <=  i  <  #p  I  {p(i),  p(i+l)}  in  g) 


(D3)  paths(g,  nl,  n2)  =  {p  |  ispath(p,  g)   & 

n(l)  =  nl   &  p(//p)  =  n2  } 

(DA)      siniple(p)    <->     p    :    tuple      &  p    :    bmap      or 

(p(l)    =  p(//p)    & 
n(l    :    #p-l)    :   bmap) 


(D5)      cycle(p)    <->   simple(p)    &  p(l)    =  p(//p)    &   //p   >    1 

(D6)   connected(g)  <->  (A  xl  in  nodes(g),  x2  in  nodes(g) 

Daths(g,  xl,  x2)  ~=  {}  ) 


(D7)   forest(t)  <->  (A  p  |  ispath(p,  t)  ->  ~cycle(p)) 

Given  a  graph  g,  a  spanning  tree  is  formed  by  repeatedly 
adding  edges  to  a  set  t,  initially  empty.  At  any  moment  In  the 
construction  of  t,  the  nodes  of  g  can  be  partitioned  such  that 
*"he  nodes  of  each  partition  is  connected  in  t,  and  no  two 
partitions  are  connected  to  each  other  in  t. 

We  call  such  partitions  max-connected-compoaents. 

(D8)  subtreeCt,  s)  =  {e:e  In  t  |  e  subset  s} 

(D9)  connected-component (t ,  g,  s)  <-> 

connected(subtree(t ,  s))  &  nodes (subtree(t,  s))  =  s 

(DIO)  max-connected-component (t ,  g,  s)  <-> 
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ronnected-component (t ,  g,  s)  & 

(A  si   I  s  subset  si  ->  ""connected-component  (t,  g,  si)) 


(Dll)  max-connected-component.s  (t ,  g)  = 

{c:c  in  pow(g)  |  max-connected-component (t ,  g,  c)} 


Edges  which  are  added  to  t  must  connect  two  different 
max-connected-components.  We  therefore  make  the  following 
definition.  If  s  is  a  subset  of  the  nodes  of  g,  we  define  the 
rut  of  s  as  the  set  of  all  edges  {w,  v}  such  that  w  is  in  s  and 
V  is  not  in  s. 

(D12)  cut(s,  g)  =  {e:e  in  g  |  e*s  -=  {}   &   e-s  "=   {}} 
The  algorithm  terminates  when  t  spans  g;   that  is,  when  the   cut 
of  each  max-connected-component  is  empty. 


(D13)  spanning- forest(t,  g)  <->  forest(t)   & 

(A  s  in  max-connected-components (t ,  g)  |  cut(s)  =  {}  ) 

(D14)  spanning-forests(g)  =  {t  |  spanning- forest (t ,  g)} 


Suppose  a  map  cost  assigns  a  value  to  the  edges  of  g.    The 
cost  of  a  set  of  edges  t  is  defined  by  the  formula: 


(D15)   totalcost(cost,  t)  =  +  / 

{cost(e)  *  //{el:el  in  t  |  cost(el)  =  cost(e)}  :  e  in  t} 

A  minimum  cost  spanning  forest  is  then  defined  as  follows. 


(D16)  min-cost-span- forest (t,  g,  cost)   <-> 

spanning- forest(t,  g)  &   total-cost (cost ,  t)  = 
rain  /  {total-cost (cost,  tl)  :  tl  in  spanntng-forests (g) } 

4.2.1.2  Version  I 
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Our  abstract  version  of  the  minimum  cost   spanning   tree 

algorithm  is  proven  using  the  following  lemma,  which  states  that 

if  s  is  a  subset  of  nodes(g),  t  a  minimum  cost  spanning  tree   of 

o,  and  e  is  an  edge  in  cut(s)  of  minimum  cost,  then  e  is  in  t  or 

there  is  an  edge  f  in  cut(s)  such  that  t  +  {e}  -  { f }  is   also  a 
minimum  cost  spanning  tree. 


LEIMA   1. 

min-cost-span-forest (t ,    g,    cost)      &      s   subset   nodes (g)      & 

e  in  cut(s)      &     cost(e)    =  min   /   cost[cut(s)]  -> 

(E    f  in   cut(s)    *   t    I 

min-cost-span-forest (t+{e}-{ f},  g,  cost)) 


To  prove  Lemma  1  informally,  we  assume  t  is  a  minimum  cost 
spanning  tree  without  e.  Adding  e  to  t  forms  a  cycle.  A  new 
tree  t'  can  then  be  formed  by  deleting  an  edge  f,  where  f  is  in 
cut(s)  *  t.  The  cost  of  t'  is  then  less  than  or  equal  to  the 
cost  of  t. 

Lemma  1  suggests  the  following  high  level  algorithm  for 
finding  the  minimum  cost  spanning  tree. 

(1)    1=    cost  :  smap  (g)  int 

t  :=  {>; 

(while  exists  s  in  max-connected-compoaents(t ,  g)  | 
rut(s,  g)  ~=  {}  ) 
e  :=  arb  { f :  f  in  cut(s,  g)  | 

cost(f)  =  rain  /  cost[cut(s,  g)]}; 
t  :=  t  with  e; 
end; 

1=  min-cost-span-forest (t ,  g,  cost) 
To  verify  (1),  we  add  the  following  loop  assumption: 


I 
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(2)   (E  tO  I  min-cost-span- forest (tO,  g,  cost) 

&  t  subset  tO) 

and  verify  (2)  informally  as  follows.   We  note  first  of  all  that 

there   is   a  minimum  cost   spanning   tree   for   any   graph   g. 

!-!econdly,  (2)  is  vacuously  true  when  the  loop  is  entered,   since 

t   is   empty.   We  can  prove  the  invariance  of  (2)  by  first  using 

the  definition  of  max-connected-components  to  show   that   cut(s, 

p)  *  t  =  {},  and  then  by  applying  lemma  1. 

Next,   when  execution   terminates,    we   have   available 
assertion  (2)  and  the  assertion: 


(3)   (A  s  in  raax-connected-components (t ,  g) 

rut(s,    g)  =  {}  ) 

From  this  fact,  the  output  assertion  follows. 


Now  proceeding  transformationally,  we  can  obtain  several 
different  minimum  cost  spanning  tree  algorithms  from  (1)  by 
specifying  how  the  component  s  is  to  be  chosen.  We  can  choose  s 
in  any  of  the  following  three  ways. 

(i)  To  obtain  the  greedy  algorithm,  s  is  chosen  to  be  a 
component  such  that  the  minimum  cost  edge  of  any  cut  is  in 
rut(s,  g). 

(ii)  I f  we  assume  that  g  is  a  connected  graph,  then  we  can 
always   choose    the   same   component   s,   and   thereby   obtain 
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Dijkstra's  algorithm. 

(iii)  Using  a  uniform  selection  policy,  s  is  chosen  from  a 
queue  of  components,  so  that  all  components  remain  approximately 
the  same  size. 

In  the  next  section,  we  show  the  derivation  of  the  greedy 
algorithm  in  detail. 

4.2.1.3  Greedy  Algorithm 

We  first  apply  a   reduction   in  strength   optimization   to 

replace   the   expression   max-connected-coraponents (t ,  g)  in  (1). 

We   introduce   a   variable    vs    to    store  the    set 

max-connected-components (t ,  g).   Initially,  vs  is  the  set  {{i}  : 

i  in  nodes(g)}.   ^•Jhen  an  edge  {v,   w}   is   added   to  t,   vs   is 

updated  by   merging   the   component  which   contains  v  and  the 

component  which  contains  w.    Then  we   replace   the  expression 
max-connected-components (t ,  g)  by  vs. 

Additionally,  we  expand  the  loop  header  so  that  the 
assignment  to  s  is  made  explicit.   We  thereby  obtain  the  praa: 

(4)  1=   cost  :  smap(g)  int 

t  :=  {}; 

vs  :=  {{1}  :  i  in  nodes(g)}; 

(while  E  s  in  vs  |  cut(s,  g)  ~=  { }  ) 
@0      s  :=  arb  {s:  s  in  vs  |  cut(s,  g)  "=  {}  }; 
@1      e  :=  arb  { f :  f  in  cut(s,  g)  |  cost(e)  = 

min  /  cost[cut(s,  g)]  }; 
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t  :=  t  with  e; 


vs 

V 

vs 
vs 


=  vs  less  s; 

=  arb  {c:c  in  vs  |  e  in  cut(c,  g)}; 

=  vs  less  y; 

=  vs  with  s  +  y; 


end; 


|-  min-cost-span- forest (t,  g,  cost) 


To  obtain  the  greedy  algorithm  we  select  a  minimum  cost 
edge  e  from  the  union  of  all  cuts,  by  inserting  the  following 
statement  before  QO   in  (4). 

(5)  e  :=  arb  { f : f  in  +  /  {cut(x,  g)  :  x  in  vs}   | 

cost(f)  =  min  /  cost[  +  /  {cut(x,  g)  :  x  in  vs}]  }; 

This  transformation  is  correct  since  e  is  dead  at  @0.    We   next 
use  a  substitution  transformation  to  replace  @0  by 

(6)  s  :=  arb  {c:  c  in  vs  |  e  in  cut(c,g)  }; 
This  is  valid  since 

{c:c  in  vs  I  e  in  cut(c,  g)  } 

subset   {c:c  in  vs  |  cut(c,  g)  ~=  {}  } 

We  now  apply  null  block  substitution  to  delete  the  statement   at 

(31.   This   transformation  causes  the  following  assumption  to  be 

introduced. 


(7)  1=     e   in   {f:f   in  cut(s,    g)       | 

rost(f)    =  min   /   cost[cut(s,    g)]    } 

which   simplifies    to 

(8)  1=      e    in   cut(s,    g)      &      cost(e)    =  min   /    cost[cut(s,    g)] 
From    (6),    we   can  verify   that    e   in  cut(s,    g)    is    true.      Since 

cost[cut(s,    g)]    subset    cost[+   /   {cut(x,    g) :    x    in  vs}] 
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is  true,  we  also  have 


(cut(e,  g)  =  min  /  cost[  +  /  {cut(x,  g) :  x  in  vs}]   & 

e  in  cut(s,  g) )    ->   cost(e)  =  min  /  cost[cut(s,  g)] 

We  are  then  left  with  the  praa: 


(9)   1=    cost:  smap(g)  int 

t  :=  {}; 

vs  :=  {{i}  :  i  in  nodes(g)}; 

@0     (while  E  s  in  vs  |  cut(s,  g)  ~=  {}) 

@1   e  :=  arb  {f:f  in  +  /  {cut(x,  g)  :  x  in  vs)   I 

cost(f)  =  min  /  cost[  +  /  {cut(x,  g)  :  x  in  vs}]}; 

s  :=  arb  {c:c  in  vs  |  e  in  cut(c,  g)}; 

♦"  :  =  t  with  e ; 

vs  :=  vs  less  s; 

V  :=  arb  {c:c  in  vs  |  e  in  cut(c,  g)}; 

vs  :=  vs  less  y; 

vs  :=  vs  with  s  +  y; 

end; 

|-  min-cost-span- forest (t,  g,  cost) 


To  optimize  version  (9),  we  might  try  to  maintain  the  set 

(10)  +  /  {  cut(x,  g)  :  x  in  vs  } 
in  sorted  order  as  a  tuple.  However,  updating  this  tuple  after 
vs  is  modified  might  require  a  search  through  the  tuple  to 
delete  edges  which  become  part  of  the  merged  component.  We 
therefore  maintain  a  set  which  is  contained  in  ttie  set  (10),  but 
which  is  bigger.  We  sort  the  edges  initially,  and  at  each 
iteration,  remove  the  minimum  edge,  and  no  other  edges.  After 
removing  the  minimum  edge  e,  we  then  test  to  see  whether  e  is  in 
the  cut  of  some  component.  At  a  later  step,  an  additional  data 
structure  can  be  used  to  make  this  test  efficient. 
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To  perform  the  transformation  described  above,  we  first 
assume  we  have  a  procedure  sortbycost  and  insert  the  following 
statement  before  @0. 

(11)   edgelist  :=  sortbycost (g,  cost); 
We  next  make  use  of  the  following   derived   transformation   rule 
(which   is   used  again  in  the  nodal  spans  derivation  in  section 
A. 2. 2.). 


(12)     (while  FM)  (  |=  FM  ->  FMl 

SB       =>       while  FMl  ) 
end;  SBl 

1=   FM2   ->   F;-! 
if  FM2  then 

SB 
end; 
end; 

provided  SBl  can  be  inserted  before  SB. 


This  rule  justifies  the  following  substitutions  below.  For 
SBl,  we  substitute  the  code  which  selects  the  first  element  from 
edgelist: 

(13)       e  :=  edgelist(l); 

edgelist  :=  edgelist(2:  edgelist); 

and  for  FM2  we  substitute  a  test,  which  checks  whether  e   is   in 

the  cut  of  any  component: 

e  in  +  /  {cut(x,  g)  :  x  in  vs}. 

We  also  introduce  a  new  while  loop  condition   for   FMl   to   test 

whether  edgelist  is  empty: 

pdgelist  ~=  [] . 
This  gives  us  the  following  praa. 

(14)   1=   cost:  map(g)  int 
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t  :=  {}; 

vs  :=  {{i}  :  i  in  nodes(g)}; 

edgelist  :=  sortbycost (g,  cost); 

@0     (  1=  (E  s  in  vs  I  cut(s,  g)  =  {})  ->  edgelist  ""=  [] 
while  edgelist  "=    []) 

e    :=  edgelist (1) ; 

edgelist  :=  edgelist(2:  edgelist); 

@1  I =  e  in  +  /  {cut(x,  g)  :  x  in  vs}   -> 

(E  s  in  vs  I  cut(s,  g)  ~=  { }  ) 

if  e  in  +  /  {cut(x,  g)  :  x  in  vs}  then 
(?2  e  :=  arb  { f :  f  in  +  /  {cut(x,  g)  :  x  in  vs}   | 

cost(f)  =  min  /  cost[  +  /  {cut(x,  g)  :  x  in  vs}]  }; 
s  :=  arb  {c:c  in  vs  |  e  in  cut(c,  g)}; 
1-  :=  t  with  e; 
vs  :=  vs  less  s; 

y  :=  arb  {c:c  in  vs  |  e  in  ciit(c,  g)}; 
vs  :=  vs  less  y; 
vs  :=  vs  with  s  +  y; 
end; 
end; 

|-  min-cost-span- forest (t,  g,  cost) 

The  assumption  at  (?0  can  then  be  proved  from  the  invariant: 

(15)  +  /  {  cut(x,  g)  :  X  in  vs  }  subset  range  edgelist 
Finally,  using  the  null  block  substitution,  we  delete   statement 
@2.   This  introduces  the  assumption: 

(16)  e  in   { f : f  in  +  /  {cut(x,  g)  :  x  in  vs}   | 

cost(f)  =  min  /  cost[  +  /  {cut(x,  g)  :  x  in  vs}]}; 

Statement  (16)  follows  from  (15)  and  the  fact  that  the   edgelist 
is  sorted  by  cost.   We  now  have  the  praa: 

(17)  1=   cost  :  map(g)  int 

t  :=  {}; 

vs  :=  {{i}  :  i  in  nodes(g)}; 
@0     edgelist  :=  sortbycost (g,  cost); 

(while  edgelist  ""=  []) 
e  :=  edgelist(l) ; 
edgelist  :=  edgelist  (2: //edgelist) ; 
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(31  if  e  in  +  /  {cut(x,  g)  :  x  in  vs}  then 

s  :=  arb  {c:c  in  vs  |  e  in  cut(c,  g)}; 
t  :=  t  with  e; 
vs  :=  vs  less  s; 

y  :=  arb  {c:c  in  vs  |  e  in  cut(c,  g)}; 
vs  :=  vs  less  y; 
@2  vs  :=  vs  with  s  +  y; 

end; 
end; 


|-  min-cost-span- fores t (t ,  g,  cost) 

As  a  final  improvement,  we  select  a  data  representation  for 
vs  which  makes  the  operation  of  finding  the  component  which 
contains  a  particular  node  efficient,  namely  that  introduced  by 
Hopcroft,  Ullman,  and  Tarjan  in  connection  with  the  union/find 
problem  (Tarjan's  graph  reducibility  algorithm,  described  in 
Chapter  5,  also  exemplifies  the  use  of  this  data  structure 
trans  formation. ) 

To  apply  this  transformation  to  the  greedy  algorithm,  note 
fhat  since  vs  is  a  set  of  disjoint  subsets  of  the  nodes  of  g, 
merging  components  is  a  disjoint  union  operation,  and  testing 
whether  an  edge  is  in  the  cut  of  any  component  is  equivalent  to 
two  find  operations. 

We  represent  the  nodes  of  each  component  in  vs  as  nodes  of 
a  tree  in  a  forest  of  component  trees.  A  single-valued  map 
father  is  introduced  to  represent  the  father  relationship. 
Given  a  map  f:   s  ->  s ,  we  define  the  limit  of  f  as  follows. 

haspath(f,  xl,  x2)  <->  E  p  |  p  :  tuple (nodes (f) )   & 
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r.(l)  =  xl  &  p(#p)  =  x2   & 

(A  1  <=  k  <  #p  I  [p(k),  p(k+l)]  in  f) 

limit(f,  x)  = 

arb  {y:y  in  nodes(f)  1  haspath(f,  x,  y)  &  y  ''in  dora  f} 

We  will  use  the  notation  f-lim(x)  for  limit (f,  x) . 


If  f  is  a  function  then  limit (f,  x)  is  single   valued,   and 

Tf   f  contains  no  cycles,  then  the  limit  function  is  everywhere 

defined.   In  our  example,  father  is  a  function  and   contains   no 

cycles,   since  it  is  a  tree.   For  a  node  z,  father-lira(z)  is  the 

root  of  the  tree  containing  z,  and  if  two  nodes  x  and  y   are   in 

the   same   component,    then   father-lira(x)   =   father-lim(y) . 

Furthermore,  if  x  and  y  are  roots  of  trees,  then  the  operation 

father(x)  :=  y; 
merges  the  trees. 

These  remarks  allow  us   to  transform  our  algorithm  as 
follows.   We  introduce  the  statement 

father  :=  {}; 
before  @0,  the  code 


V  :=  arb  e; 

w  :=  arb  (e  -  {v}) ; 
X  :=  father-lim(v) ; 
z  :=  father-lim(w) ; 

before  !?1,  and  the  statement 


father (x)  :-  z; 
after  (32.   We  then  claim  that  the  following  assertion   is   loop 
^nvariant. 
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(18)      (A  V  in  nodes (g),  w  in  nodes (g)  | 

{v,  w}  in  +  /  {cut(x,  g)  :  X  in  vs}  <-> 
father-lira(v)  "=  father-lim(w) ) 

Using  (18)  we  can  replace  the  expression  at  @1  by 


X   =  z 

and  delete  assignments  to  vs,  s,  and  y.   This  leaves  us  with  the 
praa: 


(19)       1=      cost    :    map(g)    int 

t    :=   {}; 

edgelist    :=   sortbycost (g,    cost); 

father    :=   {}; 

(while  edgelist  "=    []) 

e  : =  edgelist (1)  ; 

edgelist  :=  edgelist  (2: //edgelist) ; 

V  : =  arb  e; 

w  :=  arb  (e  -  {v}) ; 

X  :=  father-lim(v) ; 

z  :=  father-lira(w) ; 

T  f  X  "=   z  then 
f  :=  t  with  e; 
father(x)  :=  z; 
end; 
end; 

|-  min-cost-span- forest (t ,  g,  cost) 


We  now  describe  a  data  structure  which  implements  the 
union-find  operations  efficiently  (  asymtotically  n  *  a(n)  where 
a  is  the  inverse  of  Ackerman's  function.)  The  corresponding  data 
structure  transformation  introduces  compressed  balanced  trees. 
Note  that  this  is  the  type  of  standard  data-structure  related 
code  fragment  that  one  would  expect  to  find  in  a  praa  library. 
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This  transformation  is  as  follows.  Suppose  a  map  f  :  s  -> 
s  is  used  in  a  praa  R  such  that  f  is  initially  empty,  and  the 
only  subsequent  occurences  of  f  are  of  the  form: 

(20)  1=  y  in  s 

var  : =  f-lira(y ) ; 
^nd 

(21)  1=  X  in  s  &  z  in  s  &  y  ~in  dora  f  &  z  "in  dora  f 
f(x)  :=  z; 

Then  we   can  replace   f  by   a   compressed   balanced    forest 

representation   of   it.   To  represent  f  as  a  compressed  balanced 

forest,  we  aim  to  reorganize  its  ordinary  tree  representation  so 

as   to  keep   the   level   of  most   of   its   nodes  small  (making 

root-finding  an  almost  linear  time  operation.)  To  achieve   this, 

two   things  are  done.   Whenever  two  trees  are  joined,  the  larger 

tree  is  always  appended  to  the  smaller  tree,  thus  balancing   the 

trees;    secondly,   whenever  an  f-lim(x)  operation  is  performed, 

the  limit  node  is  made  the  immediate  father  of  all  nodes   along 

the  path  from  x  tj  the  limit  node. 

This  leads  us  to  introduce  three  auxiliary  maps: 

fcomp  -  a  forest  of  compressed  trees. 

froot  -  a  m,apping  from  tree  roots  of  coop  to 
tree  roots  of  fcomp. 

count  -  the  number  of  descendants  of  nodes  of  fcomp 

Throughout,  the  invariant 

f-lim(x)  =  froot (fcomp-lim(x) ) 
is   maintained.    To   balance   the   trees   in   the   forest   just 
introduced,   we  proceed  as  follows,   whenever  the  statement  (21) 
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is  executed,  count (fcomp-lira(x) )  is  compared  to 
count ( fcomp-lim(y) ) ,  and  the  smaller  tree  is  appended  to  the 
larger  tree-  Froot  and  count  are  then  adjusted  accordingly. 
The  following  code  initializes  these  variables.  We  assume  that 
s  is  the  domain  of  f. 


(22) 


fcoinp 
froot 
count 


=  {}; 

=    {  [z,    z]     :    z   in   s}; 
=    {  [z,    1]     :    z    in   s}; 


Two  procedures,  balance,  which  performs  balancing,  and  findlira, 
which  performs  path  compression  and  returns  f-lim(x)  are 
supplied;      code    for    these   procedures    is   as    follows. 

proc    findlimC fcorapl ,    xl)    returns    fconp2,    x2 ; 

1=  A  X    in   s    I     f-lim(x)    =    froot ( fcompl-lim(x) )      & 
vl    in   s 

si    :=   {}; 
x2    : =  xl ; 

(while   x2    in   dom    fcompl) 

si  with   x2 ; 

x2    : =    fcorapl (x2 )  ; 
end; 

fcorap2    :=    fcompl; 
(forall    7.    in   si) 

fcomp2(z)    :=   x2 ; 
end; 
x2    :=    froot (x2) ; 

return; 

|-  A  X    in   s    I     f-lim(x)    =    froot ( fcomp2-lim(x) )      & 
v2    =    froot ( fcorap2-lim(xl ) ) 

end   proc; 


proc   balance ( froot 1 ,     fcorapl,    count  1,    xl,    x2) 

returns    frot)t2,    fcorap2,    count2; 

1=  A   X    in   s    I     f-lira(x)    =    froot 1 ( fcompl-lim(x) )      & 
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X    ~in   don    fcompl      -> 

countl(x)    =   //{y:y    in   s    |     fcompl-lim(y )    =  x) 


[fcomp2,    rl]     :=    findlim( fcompl ,    xl); 
[fcomp2,    r2]     :=    f indlim( fcorap2 ,    x2); 
froot2    :=   frootl; 
countZ    :=   countl; 

if   countl(rl)    <   countl(r2)    then 


Pise 

end; 
return; 


fcomp2 (r 1 ) 
count2 (r2) 
froot2(r2) 

fcomp2 (r2) 
count2(rl) 
froot2(rl) 


=   r2; 

=   count2(rl)    +  count2(r2); 

=   x2; 

=   rl; 

=   count2(rl)    +  count2(r2); 

=  x2: 


A  x    in   s    I     froot2( fcorap2-lim(xl) )    =   x2      & 
fcorap l-lim(x)    ~=  xl      -> 

<^coiiipl-lim(x)    =    fcorap2-liTa(x)      & 
X   ~in   don    fcorap2      -> 

count2(x)    =   #{y:y   in   s    |    fcomp2-lira(y) 


=   x} 


end; 


Given  these  functions,  we  can  replace  all  statements  of  the 
form  (20)  by  the  statement: 

(23)  [fconp,  var]  :=  f indlim( fcomp,  z); 

To  maintain  balance  we  insert  after  all  statements  of  the  form 
(21): 

(24)  [fr-jot,  f  coup ,  count]  :  = 

halance( froot ,  fcomp,  count,  x,  z); 

To  prove  that  this  data  structure  transformation  is  correct,   we 
must  show  that 


(i)  The  input  assumption  of  findlim  is  satisfied  before   Lt 
is  before  it  is  called  at  (22). 


I 

Transformations  PAGE  4-47 

F 

I 

I  (ii)  The  output  assertion  of  findlim  implies  that 

var  =  f-lim(y) 


and 


(iii)  The  input  assumption  of  the  balance  routine  is 
satisfied  before  it  is  called  in  (23).  This  enables  us  to  use 
its  output  assertion  at  (23)  to  prove  (i). 

Note  that  the  validity  of  the  input  assumption  of  the 
findlim  procedure  follows  from  the  ouput  assertions  of  findlim 
and  balance  and  the  assumption  (20).  Also,  since  initially  f  is 
empty,  f-lim(z)  =  z  for  all  z  in  s ,  so  that  the  input  assumption 
of  findlim  is  satisfied  after  initialization.  Proof  of  (ii)  is 
immediate  from  these  remarks.  It  is  also  clear  that  the  input 
Assumption  of  the  balance  routine  follows  from  the  output 
assertions  of  findlim  and  balance,  the  initialization  code,  and 
the  statement  (2).   This  justifies  our  transformation. 

Applying  this  transformation  to  (19),  we  obtain: 


(25)   1=  cost  :  map(g)  int 

t  :=  {}; 

edgelist  :=  sortbycost (g,  cost); 

fcorap  :=  {}; 

froot  :=  {[z,  z]  :  z  in  nodes (g)}; 

count  :=  {[z,  1]  :  z  in  nodes(g)}; 

(while  edgelist  ~=  []) 
e  :=  edgelist(l) ; 
edgelist  :=  edgelist(2: //edgelist) ; 
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V   :=  arb  e; 
w   :=  arb    (e  -   {v}) ; 
[fcomp,    x]    :=   findliin(  fcorap,    v)  ; 
[fcomp,    z]     :=   findlim( fcorap,    w) ; 

if  X   "=   z    then 
t    :=   t   with   e; 
[froot,    fcomp,    count]     := 

^>alance(  froot ,    fcomp,    count,    x,    z); 

end; 
end; 

|-  min-cost-span-forest(t,    g,    cost) 


Transformations  PAGE  4-49 

4.2.2  Example  2  :   Nodal  Spans  Algorithm 

We  will  next  derive  a  version  of  the  nodal  spans  parsing 
algorithm  [Schw75] .  Suppose  we  are  given  a  grammar  in  Chomsky 
Normal  Form  which  may  be  ambiguous.  Let  its  metasymbols  be  meta 
and  its  terminal  symbols  constitute  the  alphabet  alpha.  Let  an 
input  string  input  of  length  n  be  given.  Then  the  nodal  spans 
algorithm  finds  all  of  the  parses  of  the  input  string. 
Specifically,  the  algorithm  finds  a  set  of  triples  [i,  a,  j] 
called  spans,  where  a  is  a  metasymbol  of  the  grammar,  1  <=  i  <= 
",  and  i  <  j  <=  n  +  1.  More  precisely,  the  span  [i,  a,  j]  is 
said  to  be  present  in  the  input  if  input(i:j  -  1)  can  be  derived 
from  a  using  the  grammar  rules;  the  algorithm  finds  all  spans 
oresent  in  the  input  string. 

We  assume  in  presenting  this  algorithm  that  the  grammar  is 
represented  by  a  relation  rules  of  the  form: 

"leta  ->  (alpha  +  (meta  X  meta)). 

Our  initial  version  of  the  nodal  spans  algorithm  will 
involve  a  function 

presentspans (p,  q,  rules,  input), 
which  gives  the  set  of  all  metasymbols  a  such  that  the  span   [p, 
a,   p+q]   is   present   in   the   input.   A  recursive  mathematical 
definition  for  this  function  is  as  follows. 

rvresentspans(p,    q,    rules,    input)    = 
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if  q  =  1  then  {a:a  in  meta  |  input(p)  in  rules{a}} 

else 

{a:  a  in  meta,  [b,  c]  in  rules{a},  1  <=  k  <  q  | 
b  in  presentspans (p,  k,  rules,  input)  & 
c  in  presentspans (p+k,  q-k,  rules,  input)} 

end 

A  parse  of  the  input  string  exists  if  the  span  [1,  start,  n+1] 
is  present,  where  start  is  the  start  symbol  of  the  grammar. 
Since  we  are  only  interested  in  finding  complete  parses  of  the 
input,  not  all  of  the  spans  which  are  present  in  the  input 
string  are  relevant;  ultimately,  we  are  only  interested  in 
those  which  are  obtainable  from  the  span  [1,  start,  n+1]  by 
successive  decomposition.  Thus  our  algorithm  must  compute  the 
set  of  spans  m  which  are  both  present  and  belong  to  a  parse  of 
input.  These  spans  can  be  obtained  by  taking  the  transitive 
closure  of  a  relation  spans  from,  which  we  now  proceed  to  define. 

Spans  from  is  a  relation  which  formalizes  the  notion  that, 
in  order  to  span  a  string  of  terminals  starting  from  the 
nonterminal  a,  there  must  exist  a  production  a->bc  such  that  b 
and  c  generate  complementary  fragments  of  that  string.  A  formal 
'definition  is  as  follows. 

(1)  spans from(rules,  input)  = 
+  / 
{{[[i,  a,  i+t],  [i,  b,  i+k]],  [[i,  a,  i+t],  [i+k,  c,  i+t]]}  : 
1  <=  t  <=  n,  1  <=  i  <=  n+l-t,  1  <=  k  <  t,  a  in  meta, 
[b,  c]  in  rules{a}  | 

b  in  presentspans (i,  k,  rules,  input)  & 
c  in  presentspans (i+k,  t-k,  rules,  input)} 

If  start  is  in  presentspans(l,  n+1,  rules,  input),  then  the   set 

of  spans  we  wish   to   calculate   is  the  transitive  closure  of 

spansfrom  over  {[1,  start,  n+1]}.   For  this  we  use  a   transitive 


Transformations  PAGE  4-51 

closure   program,  e.g.   the  following,  which  it  is  reasonable  to 
assume  is  available  in  a  program  library. 


(2)    1=  r  :  map(s)  s  &  sO  subset  s   & 

=0  subset  mO  &  r[mO]  subset  mO 

m  :=  sO; 

(while  exists  x  in  m  |  "   r{x}  subset  m) 

m  : =  m  +  r{x}; 
end; 

|-  sO  subset  m  &  r [m]  subset  m  &  m  subset  mO 


We  will  obtain  a  nodal  spans  algorithm  by  appropriate 
replacement  of  the  input  variables  in  (2).  To  this  end  we 
perform  substitutions  as  follows.  The  set  s  is  replaced  by  the 
domain  of  all  possible  spans,  which  is  defined  as: 

posspans(meta,  n)  =  {[i,  a,  j]  :  a  in  meta,  1  <=  i  <=  n, 

i  <  j  <=  n+1} 

The  start  set  sO  is  {[1,  start,  n+1]}  if  a  parse  of  the  input 

string  exists.   Otherwise,  sO  is  null.   We  therefore  replace  sO 

by  the  set  startspan  defined  as: 

startspan(start,   rules,  input)  = 

^f  start  in  presentspans  (1 ,  //input,  rules,  input) 
then  {[1,  start,  #input+l]}  else  {} 
end 

Finally,  the  relation  r  is  replaced  by  spans from(rules,   input), 

and   the   input   assumption   is   strengthened  by   adding   the 

assumption: 


1=  input  :  tuple (alpha)  & 

rhs  =  alpha  +  {[ml,  m2]  :  ml  in  meta,  m2  in  meta}   & 

rules  :  map (meta)  rhs  & 

start  in  meta   &   n  =  //input 
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This  brings  us  to  the  following  program. 


(3)     1=  input  :  tuple(alpha)   & 

rhs  =  alpha  +  {[ml,  m2]  :  ml  in  meta,  m2  in  meta}   & 
rules  :  map (meta)  rhs  & 
start  in  meta  &  n  =  //input   & 
@0  spans from(rules,  input)  :  map (posspans (meta,  n)) 

"osspans (meta,  n)  & 
@1   startspan(start,  rules,  input)  subset  posspans (meta,  n)   & 
startspan(start ,  rules,  input)  subset  mO  & 
spans from(rules,  input) [mO]  subset  mO 

m  :=  startspan(start,  rules,  input); 
(while  exists  x  in  m  | 

spans from(rules,  input){x}  subset  m) 
III  :=  m  +  spans from(rules,  input){x}; 
pnd; 

|-  startspan(start ,  rules,  input)  subset  m  & 
spans frora(rules,  input) [m]  subset  m  & 
m  subset  raO 


The  input  assumption  of  this  program  can  be  somewhat  simplified 
by  noting  that  the  conjuncts  at  (30  and  @1  are  equivalent  to 
true. 

We  now  proceed  to  optimize  this  algorithm.  First  of  all, 
we  introduce  a  map  spans  to  represent  spans  from.  To  this  end  we 
insert  the  following  initialization  statement. 

(4)   spans  :=  spans frora(rules,  input); 
Expanding  (4)  using  the  definition  of  spans  from,  we  then  obtain: 


spans  :=     +  / 
{{[[i,  a,  i+t],  [i,  b,  i+k]],  [[i,  a,  i+t],  [i+k,  c,  i+t]]}  : 
1  <=  t  <=  n,  1  <=  i  <=  n+l-t,  1  <=  k  <  t,  a  in  meta, 
[b,  c]  in  rules{a}  | 

b  in  presentspans (i,  k,  rules,  input)  & 
c  in  presentspans (1+k,  t-k,  rules.  Input)} 

TTsing  a  compilation  transformation  we  expand  the  above  statement 
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into: 


(5)   spans  :=  {}; 

(forall  1  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 
(forall  1  <=  k  <  t) 
(forall  a  in  raeta) 

(forall  [b,  c]  in  rules{a}) 

@0     if  b  in  presentspans (i,  k,  rules,  input)   & 

c  in  presentspans (i+k,  t-k,  rules,  input)  then 
spans  :=  spans  + 
{[[i,  a,  i+t],  [i,  b,  i+k]],  [ [i,  a,  i+t] ,  [i+k,  c,  i+t]}; 
end; 

end  forall; 
end  forall; 
end  forall; 
end  forall; 
end  forall; 


Next  consider  presentspans.  Given  that  the  definition  of 
presentspans  is  recursive,  we  might  naturally  implement 
presentspans  as  a  recursive  function,  called  at  statement  (§0. 
However,  evaluation  of  such  a  function  would  result  in  many 
redundant  invocations,  and  so  the  use  of  a  memo  map  to  store 
values  of  presentspans  is  more  efficient.  (We  have,  in  fact, 
set  up  this  example  to  motivate  the  introduction  of  a  memo  map 
at  this  point.)  We  therefore  introduce  a  map  present  to  save 
computed  values  of  presentspans,  and  arrange  the  computation  in 
«uch  a  way  as  to  calculate  present (i,  j)  from  values  of 
oresent(p,  q) ,  where  q  <  j.  In  more  detail,  we  begin  by 
inserting  the  following  initialization  statement  before  (5). 


(6)   present  :=  {[i,  [t,  a] ]  :  1  <=  t  <=  n,   1  <=  i  <=  n+l-t, 

a  in  presentspans(i,  t,  meta,  input)}; 

As  above,  we  expand  (6)  using  a   compilation   transformation   to 
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obtain: 

(7)  present  :=  {}; 

(I-    1  <=  y  <  t,  1  <=  X  <=  n+l-y  |  present{x,  y}  = 

'^resentspans(x,  y,  meta,  input) 
^orall  1  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 
@1  present  :=  present  + 

{[i,  [t,  a]]  :  a  in  presentspans(i,  t,  meta,  input)}; 

^nd  forall; 
end  forall; 

We  next  fuse  the  fragment  (7)  with  (5),  and  expand  the  statement 

at  @1  using  the  definition  of  presentspans  in  (7),  obtaining: 

(8)  present  :=  {}; 
spans    :=  {}; 

(I-   1  <=  y  <  t,  1  <=  X  <=  n+l-y  |  present{x,  y}  = 

oresentspans (x,  y,  meta,  input) 
forall  1  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 


(30   present  :=  present  + 
■•  f  t  =  1  then 

{[i,  [t,  a]]:a  in  meta  |  input(i)  in  rules{a}}  else 
{[i,  [t,  a]]:  a  in  meta,  [b,  c]  in  rules{a},  1  <=  k  <  t  | 
b  in  presentspans (i,  k,  rules,  input)  & 
r   in  presentspans (i+k,  t-k,  rules,  input)}  end; 

(forall  1  <=  k  <  t) 
(forall  a  in  meta) 

(forall  [b,  c]  in  rules{a}) 

1 f  b  in  presentspans (i,  k,  rules,  input)   & 

c  in  presentspans (i+k,  t-k,  rules,  input)  then 
spans  :=  spans  + 
{[[i,  a,  i+t],  [i,  b,  i+k]],  [ [i,  a,  i+t] ,  [i+k,  c,  i+t]}; 
pnd; 

end  forall; 
end  forall; 
end  forall; 
end  forall; 
end  forall; 

We  then  rewrite  statement  @0  as  follows: 
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(9)  if  t  =  1  then 

oresent  :=  present  + 

{[[i,  [t,  a]]  :  a  in  meta  |  input(i)  in  rules{a}>; 
else 

present  :=  present  + 
{[i,  [t,  a]]:  a  in  meta,  [b,  c]  in  rules{a},  1  <=  k  <  t  | 
>>  in  presentspans  (i,  k,  rules,  input)  & 
c  in  presentspans (i+k,  t-k,  rules,  input)}; 
end; 


The  condition  in  statement  (9)  is  true  only  once  when  t  is 
1;  that  is,  only  during  the  first  iteration  of  the  outer  loop. 
We  therefore  unroll  the  outer  loop  once,  and  simplify  the  code 
which  is  moved  out  of  the  loop.   This  gives  the  following  code. 


(10)  present  :=  {}; 
spans    :=  {}; 

(forall  1  <=  i  <=  n) 
nresent  :=  present  + 

{[i,  [1,  a]]  :  a  in  meta  |  input(i)  in  rules{a}}; 

end  forall; 

(I-    1  <=  y  <  t,  1  <=  X  <=  n+l-y  |  present{x,  y}  = 

presentspans (x,  y,  meta,  input) 
forall  2  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 

@0   present  :=  present  + 

{[i.  [t,  a]]:  a  in  meta,  [b,  c]  in  rules{a},  1  <=  k  <  t  | 
b  in  presentspans (i,  k,  rules,  input)  & 
<"  in  presentspans  (i+k,  t-k,  rules,  input)}; 

@1        (forall  1  <=  k  <  t) 

(forall  a  in  meta) 

(forall  [b,  c]  in  rules{a}) 

if  b  in  presentspans (i,  k,  rules,  input)   & 

c.   in  presentspans  (i+k,  t-k,  rules,  input)  then 
spans  :=  spans  + 
{[[i,  a,  i+t],  [i,  b,  i+k]],  [[i,  a,  i+t] ,  [i+k,  c,  i+t]}; 
end; 

end  forall; 
pnd  forall; 
end  forall; 
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end  forall; 
end  forall; 


Again,  we  use  a  compilation  transformation  to  expand  the 
statement  at  @0  into  a  loop,  which  we  then  jam  with  the  nested 
loop  at  @1,  giving: 

(11)  present  :=  {}; 
spans    :=  {}; 

(forall    1   <=  i   <=  n) 
present    :=  present  + 
{[i,    [1,    a]]    :    a   in  raeta    |    input(i)   in  rules{a}}; 
end   forall; 

@0    (I-    1  <=  y  <  t,  1  <=  X  <=  n+l-y  |  present{x,  y}  = 

Dresentspans (x,  y,  meta,  input) 
forall  2  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 
(forall  1  <=  k  <  t) 
(forall  a  in  meta) 

(forall  [b,  c]  in  rules{a}) 

@1      if  b  in  presentspans(i,  k,  rules,  input)   & 

c  in  presentspans(i+k,  t-k,  rules,  input)  then 
present  :=  present  +  {[i,  [t,  a]]}; 
spans  :=  spans  + 
{[[i,  a,  i+t],  [i,  b,  i+k]],  [[i,  a,  i+t],  [i+k,  c,  i+t]]}; 
end; 

end  forall; 
end  forall; 
end  forall; 
end  forall; 
end  forall; 

The  assertion  at   @0   enables   us   to   replace   the  uses   of 

Dresentspans  at  (?1  by  present.   We  can  also  now  replace  uses  of 

spans from(rules,  input)  in  our  original  program  by  spans.   We 

oresent  the  entire  program  below. 


(12)     1=  input  :  tuple(alpha)   & 
rhs  =  alpha  +  {[ml,  m2]  :  ml  in  meta,  m2  in  meta}   & 
rules  :  map(meta)  rhs  & 
start  in  meta   &   n  =  //input   & 
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startspan(start ,  rules,  input)  subset  mO  & 
spans from(rules,  input) [mO]  subset  mO 


oresent  :=  {}; 
spans    :=  {}; 

(forall  1  <=  i  <=  n) 
present  :=  present  + 

{[i.  [1.  a]]  :  a  in  meta  |  input(i)  in  rules{a}}; 
end  forall; 

(forall  2  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 
(forall  1  <=  k  <  t) 
(forall  a  in  meta) 

(forall  [b,  c]  in  rules{a}) 

if  b  in  present{i,  k}  &  c  in  present{i+k,  t-k}  then 
nresent  :=  present  +  {[i,  [t,  a]]}; 
spans  :=  spans  + 
{[[i,  a,  i+t],  [i,  b,  i+k]],  [[i,  a,  i+t],  [i+k,  c,  i+t]]}; 
end; 


end  forall; 
end  forall; 
end  forall; 
end  forall; 
end  forall; 

m  :=   if  start  in  present{l,  n} 

then  {[1,  start,  /^input+1]}  else  {}  end; 

@0     (while  exists  x  in  m  |   ~  spans{x}  subset  m) 
m  :=  m  +  spans{x}; 
end; 

|-  startspan(start,  rules,  input)  subset  m  & 
spans from(rules,  input) [m]  subset  m  & 
vn   subset  mO 


Our  next  goal  is  to  optimize  the  while  loop  at  @0  by 
transforming  it  into  'workset'  code.  This  uses  the 
transformation  (16)  from  section  4.1.10.  Before  applying  this 
transformation,  we  introduce  variables  res  and  work,  initialized 
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hy  the  following  statements,  which  must  be  inserted  before  @0. 

res  :=  {}; 
work  :=  m; 

Then  we  apply  the  transformation,   substituting   the  expression 

(exists   X  in  work)  for  FMl,  the  expression  ''spans{x}  subset  m 

for  FM2,  and  the  following  code  for  SBl. 


'-es  :=  res  +  {x}; 

"ork  :=  work  +  spans{x}  -  res; 

''his  produces  the  following  code  fragment. 


m  :=  if  start  in  presentd,  n} 

i-hen  {[1,  start,  //input+1]}  else  {}  end; 
^es  :=  {}; 
"ork  :=  m; 

@0     (1=  {x:x  in  ml"  spans{x}  subset  m}  ~=  {}  ->  work  ""=  { } 
while  exists  x  in  work) 
res  :=  res  +  {x}; 
"ork  :=  work  +  spans{x}  -  res; 

@1       1=  "spans{x}  subset  m  -> 

(  x  in  m  I  "*  spans{x}  subset  ra) 

if  ~spans{x}  subset  m  then 

m  :=  m  +  spans{x}; 
'^nd; 
end; 

We   can  prove   that    the    following  assertions   are   loop   Invariant. 


(13)  {x:x    in  m    |    *'spans{x}   subset   ra}      subset  work 

(14)  ni  =  work  +  res 

'"•iven  (13),  assumption  @0  is  easily  verified.   The  assumption  at 
(31  is  also  easy  to  prove. 


Finally,  since  work  =  {}  when  the  loop  exits,   we   can  use 
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(14)   to   change   m   to   res   in   the  output  assertion,  and  then 
eliminate  m  using  the  deletion  rule. 

All  this  brings  us  to  a  final  program: 


(15)    1=  input  :  tuple(alpha)   & 

rhs  =  alpha  +  {[ml,  m2]  :  ml  in  meta,  m2  in  meta}   & 
rules  :  map (meta)  rhs   & 
start  in  meta   &  n  =  //input   & 
startspan(start ,  rules,  input)  subset  mO  & 
spans from(rules,  input) [raO]  subset  mO 


present  :=  {}; 
spans    :=  {}; 

(forall  1  <=  i  <=  n) 
present  :=  present  + 

{[i,  [1,  a]]  :  a  in  meta  |  input(i)  in  rules{a}}; 
end  forall; 

(forall  2  <=  t  <=  n) 

(forall  1  <=  i  <=  n+l-t) 
(forall  1  <=  k  <t) 

(forall  a  in  meta) 

(forall  [b,  c]  in  rules{a}) 

if  b  in  present{i,  k}  &  c  in  present{i+k,  t-k}  then 
present  :=  present  +  {[i,  [t,  a]]}; 
spans  :=  spans  + 
{[[i,  a,  i+t],  [i,  b,  i+k] ] ,  [[i,  a,  i+t] ,  [i+k,  c,  i+t]]}; 
end; 

end  forall; 
end  forall; 
end  forall; 
end  forall; 
end  forall; 

work  :=  if  start  in  present{l,  n> 

then  {[1,  start,  //input+1]}  else  {}  end; 
res  :=  {}; 

(while  exists  x  in  work) 

res  :=  res  +  {x}; 

work  :=  work  +  spans {x}  -  res; 
end; 

|-  startspan(start ,  rules,  input)  subset  res  & 
spans from(rules,  input) [res]  subset  res   & 
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res  subset  mO 


It  is  to  be  noted  that  version  (15)  has  been  obtained  using 
transformations  requiring  very  little  assertion  verification, 
and  the  proofs  of  these  assertions  are  not  difficult.  Version 
(15)  can  be  further  optimized  by  storing  the  set  of  spans  in  a 
more  efficient  way  (see  Schwartz  [Schw75]). 


CHAPTER  5 
Case  Study:  Tarjan's  Graph  Reducibility  Test  Algorithm 


A  more  realistic  understanding  of  the  labor  which  the  user 
of  an  actual  program  verification  system  would  have  to  expend  to 
handle  significant  algorithms  requires  consideration  of  more 
complex  examples  than  those  discussed  so  far.  Accordingly,  in 
this  chapter,  we  will  study  a  flow  analysis  algorithm  of  Tarjan 
[Ta74] ,  which  tests  a  program  flow  graph  for  irreducibility  in 
time  proportional  to  n*a(n),  where  n  is  the  number  of  edges  in 
the  graph  and  a  is  an  inverse  of  Ackerman's  function. 

The  algorithm  is  quite  subtle,  and  its  proof  lengthy  and 
involved,  which  makes  for  a  challenging  example.  To  divide  the 
task  into  managable  subparts,  we  will  first  prove  the  correctness 
of  the  algorithm  expressed  in  a  high  level  form,  and  then  apply 
transformations  to  derive  a  verified  fast  version  of  it. 
Verification  of  the  high  level  version  will  be  straightforward 
once  the  mathematical  preliminaries  have  been  presented.  At  each 
transformation  step,  additional  pieces  of  the  proof  will  have  to 
be  supplied.  In  accordance  with  our  desire  to  model  a  natural 
style   of   program   evolution,    our   derivation   follows   the 
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presentation  of  the  algorithm  in  [SchS78] . 

This  chapter  is  divided  into   two  main  sections.  In   the 

first  section,  we  present   the  algorithm  derivation,  and  state 

lemmas  from  which  the  proof  of  the  derivation  follows.  We  prove 
these  lemmas  in  formal  detail  in  section  5.2. 
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5.1  Algorithm  Derivation 

Before  presenting  the  high  level  version  of  the  algorithm, 
we  define  the  necessary  flow  graph  concepts  formally.  The 
definitions  are  set  up  to  facilitate  the  use  of  a  set-theoretic 
verifier. 

5.1.1  Flowgraph  Definitions  and  Statement  of  High  Level 
Algorithm 

A  flowgraph  is  a  directed  graph  with  a  distinguished  node 
called  its  root,  from  which  there  is  a  path  to  any  other  node  in 
the  graph.  The  definitions  below  develop  flow  graph  concepts 
used  in  the  algorithm.  Variable  g  denotes  a  graph  represented  as 
a  set  of  edges,  where  an  edge  is  an  ordered  pair  of  nodes. 
Variable  p  denotes  a  path  in  a  graph  represented  by  a  tuple  of 
nodes. 

(Dl)   nodes(g)  =  dom  g  +  range  g 

(D2)   edges(p)  =  {[p(i),  p(i+l)]  :  1  <=  i  <  //p} 

(D3)   ispath(p,  g)  <->   p  :  tuple(nodes(g) )   & 
edges (p)  subset  g 

(D4)   paths(g,  nl,  n2)  =  {p  |  ispath(p,  g)   & 
p(l)  =  nl  &   p(#p)  =  n2  } 

(D5)   simple(p)   <-> 

p:  tuple  &   (p  :  bmap  or 

p(l)  =  p(#p)  &  p(l:#p-l)  :  bmap) 

(D6)   cycle(p)  <->   simple(p)   &   p(l)  =  p(#p) 

&   //p  >  1 

(D7)   cyclefree(g)   <->   (A  p  |  ispath(p,  g)  ->   "cycleCp)) 
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(D8)   haspath(g,  nl,  n2)  <->  paths(g,  nl,  n2)  '~=  { } 


(D9)  flowgraph(g,  r)  <->  (A  n  in  nodes (g)  1 
haspath(g,  r,  n)  )  &  g  :  map 


Given  a  relation  r,  the  limitset  of  a  node  x  in  r  is  the  set 
of  all  nodes  y  which  can  be  reached  from  x,  but  from  which  no 
other  node  can  be  reached. 

(DIO)   limitset(r,  x)  = 

{y:y  in  nodes(r)  |  haspath(r,  x,  y)  &  y  "in  dom  r} 

We  note  that  if  r  is  a  function,  then  the  limitset  of  a  node   x 

contains  at  most  one  element,  and  if  the  size  of  r  is  finite,  and 

r  contains  no  cycles,  then  the  limitset  of  every  node  x   is   not 

empty.    If   f   is  a  function,  we  define  the  limit  of  a  node  x  as 

follows : 

(DU)   lirait(f,  x)  =  arb  limitset(f,  x) 
We  will  use  the  notation  f-lim(x)  for  limit (f,  x) . 

Given  a  flowgraph  g,  Tarjan's  algorithm  will  attempt  to 
discover  a  subset  of  the  nodes  of  g  which  is  a  strongly  connected 
interval.  A  strongly  connected  interval  i  is  a  subset  of  the 
nodes  of  g  having  the  following  properties: 

(i)  i  contains  more  than  one  node. 

(it)  There  is  a  distinguished  node  in   i   called   the  head, 
such  that  any  cycle  in  g  containing  only  nodes  in  i  also  contains 
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the  head. 

(ill)  The  set  i  is  strongly  connected;  that  is,  for  any  two 
nodes  in  i,  there  is  a  path  p  in  g  connecting  these  nodes,  such 
that  all  nodes  of  p  are  in  i. 

(iv)  The  set  i  has  a  single  entry  point  x;  that  is,  if  g 
contains  an  edge  [u,  v]  such  that  u  "in  i  and  v  in  i,  then  v  =  x. 

These  properties  are  expressed  in  the  notion  of 
strong-interval,  defined  below. 


(D12)  strong-interval (i,  g,  x)  <-> 
#i  >  1   & 
(A  p  I  ispath(p,  g)  &  range  p  subset  i  &  cycle (p)   -> 
X  in  range  p  )   & 
(A  nl  in  i,  n2  in  i  |  E  p  in  paths(g,  nl,  n2)  | 

range  p  subset  i  )  & 
(A  [u,  v]  in  g  I  (u  "in  i  &  v  in  i)  ->  v  =  x) 


Once  a  strong  interval  is  found,  the  graph  g  can  be 
collapsed  by  deleting  from  it  the  edges  that  are  internal  to  the 
interval,  and  by  replacing  each  edge  connecting  an  internal  node 
of  the  interval  and  one  of  its  external  nodes  with  an  edge  from 
the  head  to  the  external  node. 

We  define  a  function  collapse  as  follows: 


(D13)   collapse(g,  i,  x)  =  { [u,  v]  :  [u,  v]  in  g  |  u  ~in  i}  + 
{[x,  v] :  [w,  v]  in  g  I  w  in  i  &  v  "in  i} 
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We  also  supply  the  following  useful  relation. 


(D14)  collapsed(gl,  g2)   <-> 
(E  i  in  pow  nodes(gl),  x  in  i  1  strong-interval (i,  gl,  x)  & 
g2  =  collapse(gl,  i,  x)  ) 


If  g  can  be  reduced  to  a  graph  with  no  cycles  by  repeated 
application  of  interval  collapsing,  then  g  is  said  to  be 
reducible.  Otherwise,  successive  collapsing  will  eventually 
produce  a  graph  containing  a  cycle,  but  no  strong  interval. 
Before  defining  the  notion  of  reducibility,  we  define  a  relation 
over  flowgraphs  called  derivable (g) ,  where  [gl,  g2]  is  in  this 
relation  iff  g2  can  be  obtained  from  gl  by  collapsing. 

(D15)   derivable(g)  = 

{[gl,  g2]  :  gl,  g2  I  gl  :  map(nodes(g) )  nodes(g)   & 
g2:  map(nodes(g)  )  nodes(g)   &   collapsed(gl ,  g2)  } 

Finally,  a  graph  g  is  reducible  if  there  is  a  path  from  g   to   a 

graph  gl  with  no  cycles  such  that  the  path  is  in  derivable (g) . 


(D16)   reducible(g,  r)   <-> 

flowgraph(g,  r)   &   (E  gl  |  cyclefree(gl)   & 
haspath (derivable (g) ,  g,  gl)) 


Given  this  definition,  a  naive  method  for  testing  flow  graph 
reducibility  is  to  successively  collapse  g  in  all  possible  ways, 
and  test  the  collapsed  limit  graphs  for  cycles.  However,  the 
lemma  below  indicates  that  this  potentially  exponential  search  is 
unnecessary. 

LEMMA  1. 

collapsed(gl ,  g2)  -> 

(reducible(gl ,    r)    <->   reducible(g2,    r)) 
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We  now  state  a  recursive  algorithm  for  testing  flow  graph 
reducibility,  from  which  we  will  derive  Tarjan's  algorithm.  The 
algorithm,  which  repeatedly  collapses  g  until  either  g  is  reduced 
or  it  is  unable  to  collapse  g  further,  is  a  restatement  of  Lemma 
1. 

(1)   proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ~in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

if  cyclefree(g)  then 
return  true; 
else 

if  exists  gl  I  collapsed(g,  gl)  then 

return  testreduce(gl) ; 
else 

return  false; 
end; 
end; 

|-  reduceflag  <->  reducible(g,  r) 

end; 


The  formal  details  of  the  proof  of  (1)  are  given  in  the  next 
section  of  this  chapter.  For  technical  reasons,  we  have  added 
the  harmless  assumption  that  the  root  is  not  the  target  of  any 
edge- 

5.1.2  Tree  Definitions 

Tarjan's  method  for  testing  a  flow  graph  for  reducibility 
uses  a  depth  first  spanning  tree  of  g,  and  so  we  now  proceed  to 
define  various   concepts   relating   to   trees.   A   tree   can  be 
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regarded  as  a  flowgraph  such  that  each  node  except  the  root  has 
exactly  one  edge  leading  into  it.  It  is  more  convenient  for  us 
to  define  a  tree  to  be  a  map  whose  inverse  f  is  single  valued, 
and  which  is  such  that  f-lim(x)  =  f-lim(y)  for  all  nodes  x  and  y 
in  the  tree,  implying  that  there  are  no  cycles. 


(D17)   tree(t,  r)  <->   t-inv  :  smap   & 

(A  n  in  nodes (t)  |   t-inv-lim(n)  =  r) 

(D18)  numdescs(t,  x)  =  #{z:z  in  nodes(t)  |  haspath(t,  x,  z)} 


The  following  definitions  formalize  the  notion  of  depth 
first  spanning  tree.  A  depth  first  spanning  tree  of  a  graph  has 
an  associated  map  n,  defined  on  the  nodes  of  g,  specifying  the 
order  in  which  the  nodes  would  be  visited  during  a  standard 
preorder  walk.  This  leads  us  to  the  following  preliminary 
definition.  Suppose  a  map  n  numbers  the  nodes  of  a  tree.  Then  n 
is  a  tree-node  map  if  for  all  x,  y,  if  y  is  a  descendant  of  x, 
then 

n(x)  <=  n(y)  <  n(x)  +  numdescs(t,  x) 
That  is, 


(D19)  tree-node-map (t,  n,  r)  <->  tree(t,  r)  & 
n:  smap  &  dom  n  =  nodes (t)  & 
range  n  =  {k:l  <=  k  <=  //nodes  (t)}   & 
(A  X,  y  I  haspath(t,  x,  y)  -> 

n(x)  <=  n(y)   &  n(y)  <  n(x)  +  numdescs(t,  x)) 

(D20)  spantree(t,  g,  r)  <->  tree(t,  r)  & 
nodes(t)  =  nodes(g)  &  t  subset  g 

(D21)  depth-first-span(t ,  g,  n,  r)  <->  spantree(t,  g,  r)  & 
tree-node-map (t ,  n,  r)  &  (A  [x,  y]  in  g   | 
haspath(t,  x,  y)  or 

haspath(t,  y,  x)  or  n(x)  >  n(y)  ) 

(D22)  depth-first-trees(g,  r)  = 
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{[t,  n]  :  t,  n  I  depth- first-span(t ,  g,  n,  r)} 

The  following  definitions  describe  properties  of  edges   and 
nodes  of  a  graph  with  respect  to  a  depth  first  spanning  tree. 

(D23)  backedge(e,  t)  <->  haspath(t,  e(2),  e(l)) 


(D24)  target-back-edge (x,  t,  g)  <-> 

(E   u    I     [u,    x]    in  g   &   backedge([u,    x] ,    t)) 

(D25)  targ-back-edges (t ,  g)  = 

{x:x  in  nodes(g)  |  target-back-edge(x,  t,  g) } 


If  there  are  no  backedges,  then  then  there  are  no  cycles   in 
the  graph  g,  and  so  g  is  cyclefree.   That  is, 

LEMMA  2. 

The   following    formulae  are   equivalent: 

(i)    E    [t,    n]    in  depth- first-trees (g,    r)    ,    e   in  g    | 
backedge(e,    t) 

(ii)      ~'cyclefree(g) 

(iii)  A  [t,  n]  in  depth- first-trees (g,  r)  |  E  e  in  g  | 
backedge(e,  t) 

It  follows  that 


(A  [t,  n]  in  depth- first-trees (g,  r)  | 
targ-back-edges (g,  t)  =  {}) 
<-> 
cyclefree(g) 


We  therefore  can  replace  the  condition  reduced(g)  in  (1)   by 
the  condition 


(A  [t,  n]  in  depth-first-trees(g,  r) 
~E  X  in  targ-back-edges (t ,  g)) 
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to  obtain 

(2)   proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ~in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

@0    if  (A  [t,  n]  in  depth- first-trees (g,  r)  | 

-  E  X  in  targ-back-edges(t ,  g)  )  then 
return  true; 
else 

if  exists  gl  I  collapsed(g,  gl)  then 

return  testreduce(gl ) ; 
else 

return  false; 
end; 
end; 

|-   reduceflag   <->   reducible(g,    r) 

end; 


Our  next   aim  is   to   introduce   the   graph   reducibility 

criterion  used   in  Tarjan's   algorithm.    To   prepare   for  this 

transformation  step,  we  restructure  the  if  statement  at  @0  in  (2) 
using  the  rule: 


if  FM  then  if  "FM  then 

SBl  SB2 

else  <=>       else 

SB2  SBl 

end;  end; 

The  new  form  of  the  if  statement  after  this  transformation  is: 


(3)   if  E  [t,  n]  in  depth-first-trees (g,  r)  | 

E  X  in  targ-back-edges(t ,  g)  )   then 

We  next  transform  (3)  into  an  "if  exists'   statement,   which  has 

the   side   effect  of  assigning  values  to  t,  n,  and  x.   This  gives 

the  praa: 
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(4)   proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ""in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

if  exists  [t,  n]  in  depth- first-trees(g,  r), 
X  in  targ-back-edges(t ,  g)    then 
if  exists  gl  I  collapsed(g,  gl)  then 
@0  return  testreduce(gl) ; 

else 

return  false; 
end; 
else 

return  true; 
end; 

|-  reduceflag  <->  reducible(g,  r) 
end; 


5.1.3  Tarjan's  Reducibility  Criterion 


To  specify  Tarjan's  test  for  reducibility,  we  define  the  map 
reachunder  as  follows. 

Suppose  that  r  is  the  root  of  a  flowgraph  g,  and  t  a  depth 
first  spanning  tree  for  g.  Given  a  node  x,  reachunder (g,  t,  x) 
is  the  set  of  nodes  connected  to  x  by  a  path  p,  such  that  p  does 
not  contain  x,  except  as  an  end  point,  and  the  last  edge  of  the 
path  is  a  back  edge.  We  define  reachunder  in  terms  of  the 
inverse  of  g  as  follows: 


(D26)   reachunder(g,  t,  x)  =  {y:y  in  nodes(g)  | 

(E   z   in  nodes(g)    |    backedge([z,    x] ,    t)    & 
[z,    x]    in  g   & 
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haspath({ [x2,  xl]  :  [x2,xl]  in  g-inv  |  xl  ~=  x},  z,  y))} 
In  other  words,  reachunder (g,  t,  x)  is  the  transitive  closure  of 
the  relation 

{[n2,  nl]  :  [n2,  nl]  in  g-inv  |  nl  ~=  x} 
over  the  initial  set 
{z:z  in  nodes(g)  |  backedge([z,  x] ,  t)  &  [z,  x]  in  g}. 

Then  Lemma  3  below  states  that  a  graph  g  is  not  reducible 
iff  there  is  a  node  x  such  that  the  root  of  the  graph  is  in 
reachunder (g,  t,  x) . 

LEMMA  3. 

flowgraph(g,  r)   -> 

(~reducible(g,  r)  <->  (E  x  in  nodes(g)  -  {r}, 

[t,  n]  in  depth- first-trees(g,  r)  ] 
r  in  reachunder (g,  t,  x))) 

At  this  point  we  incorporate  the  reduciblity  test  specified 
by  Lemma  3  into  our  algorithm  as  follows.  We  repeatedly  find 
strongly  connected  intervals,  and  collapse  the  graph  by  removing 
these  intervals  until  either  there  are  no  more  backedges,  or 
until  we  detect  that  the  graph  is  not  reducible.  More  precisely, 
at  each  recursive  invocation,  we  test  for  the  existence  of  a  back 
edge  target  x.  If  such  an  x  exists,  the  graph  is  not  reduced. 
We  then  apply  the  test  specified  in  Lemma  3;  namely,  we  test 
whether  the  root  is  in  the  reachunder  set  for  x.  If  it  is,  the 
graph  is  not  reducible,  and  we  quit.  Otherwise,  we  test  for  a 
strong  interval  and  collapse  the  graph  if  one  exists  as  before. 
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We  obtain  this  algorithm  from  the  praa  (4)  in  the  following 
manner.   We  replace  the  statement  at  @0  in  (4)  by  the  code: 

(5)  if  r  in  reachunder (g,  t,  x)  then 

return  false  ; 
else 

return  testreduce(gl) ; 
end; 

Since  we  are  replacing  a  single  statement  by   a  block  of  code, 

this   transformation   is  an  application  of  the  block  substitution 

rule.   The  statement  replaced  contains  a  function   call,   and   to 

justify  the  transformation  we  must  therefore  show: 

(i)  The  output  assertion  of  the  function  is  satisfied  after 
the  inserted  code  (5)  and, 

(ii)  The  function  terminates. 

In  this  case,  to  prove  (i),  we  must  show  that  the  assertion 

(6)  |-  reduceflag  <->  reducible(gl ,  r) 

is  available  after  (5).  To  prove  (6),  we  note  that  if  r  in 
reachunder(g,  t,  x) ,  then  ~reducible(gl ,  r)  follows  from  Lemma  3. 
Otherwise,  if  r  "in  reachunder (g,  t,  x) ,  testreduce  is  called 
recursively  as  before,  so  (6)  holds. 

Secondly,  the  function  testreduce  terminates  since  there  are 
no  loops  in  the  program  text,  and  the  size  of  gl  is  decremented 
at  each  recursive  call.   This  fact  is  proved  in  the  next  chapter. 
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The  replacement  just  described  yields: 


(7)   proc  testreduce(g)  returns  reduceflag  ; 

1=  flowgraph(g,  r)  &  r  "in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

@0   if  exists  (t,  n]  in  depth- first-trees{g,  r), 

X  in  targ-back-edges (t ,  g)  then 
@1        if  exists  gl  I  collapsed(g,  gl)  then 

if  r  in  reachunder(g,  t,  x)  then 

return  false; 
else 
Q2  return  testreduce(gl) ; 

end; 
else 

return  false; 
end  ; 
else 

return  true; 
end; 

|-  reduceflag  <->  reducible (g,  r) 

end; 


This  algorithm  is  an  initial,  abstract  skeleton  of  the  more 
refined  algorithm  of  Tarjan's  to  which  we  are  heading.  In  the 
remaining  sections  we  will  develop  an  efficient  form  of  this 
algorithm  incrementally.  In  its  final  version,  we  will  construct 
a  depth  first  spanning  tree  only  once,  and  we  will  not  explicitly 
collapse  the  graph  g.  This  algorithm's  remarkable  efficiency  is 
obtained  by  representing  the  derived  graph  sequence  implicitly  as 
a  forest  of  compressed  balanced  trees. 

The  major  transformation  steps  still  to  be  performed  are   as 
follows. 
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1.)  We  first  introduce  a  metnod  for  finding  a  strongly 
connected  interval-  Namely,  if  x  is  the  back  edge  target  with 
the  highest  node  number,  and  r  is  not  in  the  reachunder  set  1  of 
X,  then  1  Is  a  strongly  connected  Interval. 

2 . )  As  a  second  step,  we  supply  a  transitive  closure 
algorithm  for  computing  reachunder. 

3.)  As  written  above,  our  algorithm  computes  a   depth   first 

spanning   tree   for  g  every  time  g  is  collapsed.  Hovrever,  we  can 

apply  a  formal  differentiation  transformation   to  differentially 
update   the   depth   first   spanning   tree  with   respect   to   the 

collapsing  of  g.   After  this  the  update  code  can  be   eliminated; 

that   is,   the   depth  first  spanning  tree  need  be  calculated  only 
once . 

4.)  Our  fourth  transformational  step  eliminates  explicit 
collapsing  of  the  graph  g  by  reformulating  the  algorithm  in  terms 
of  a  union-find  operation.  In  this  context  we  will  introduce  a 
map  head  which  will  represent  the  derived  graph  sequence 
implicitly. 

5.)  Efficient  implementations  for  union-find  algorithms  are 
well-knovm  [AHL'74]  ,  [Ta74]  ,  [SchS78],  and  so  in  a  final  step  we 
can  realize  Tarjan's  algorithm  by  implementing  this  head  map 
using  compressed  balanced  trees.  In  our  final  transformation 
steps,  we  also  perform  various  cleanup  optimizations,  and   supply 
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a  procedure  for  computing  the  depth  first  spanning  tree. 
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5.1.4  Step  I  -  Method  for  Obtaining  a  Strong  Interval 

We  now  take  a  first  transformational  step  by  specifying  how 
a  collapsed  graph  is  to  be  obtained.  Here  we  make  use  of  the 
following  lemma. 

Let  g  be  a  flowgraph  with  root  r,  t  a  depth  first  spanning 
tree  of  g  with  node  numbering  map  n,  and  x  the  backedge  target 
with  the  largest  node  number.  Then  reachunder (g,  t,  x)  +  {x} 
either  includes  the  graph  root  or  is  a  strong  interval.  In 
formal  notation,  this  lemma  is: 

LEMMA  4. 

flowgraph(g,  r)  &  [t,  n]  in  depth-first-trees (g,  r)  & 

target-back-edge(x,  t,  g)   &   r  ~in  reachunder (g,  t,  x)   & 

n(x)  =  max  /  n  [targ-back-edges (t ,  g)  ] 

-> 

strong-interval (reachunder (g,  t,  x)  +  {x},  g,  x) 

Therefore  if  we  choose  x  to  be  the  backedge  target  with  the 
highest  node  number,  then  if  r  "in  reachunder (g,  t,  x) ,  the  set 

reachunder (g,  t,  x)  +  {x} 
forms  an  interval,   and   we   can   collapse   g  by   removing   this 
interval.    To   achieve   this   goal,   we  proceed  in  the  following 
manner. 

By  definition  (D26) ,  the  condition  at  @0  in  (7)  of  the  last 
section  is  equivalent  to 


(1)    exists  [t,  n]  in  depth- first-trees (g,  r)  , 

X  in  {y:y  in  nodes(g)  |  target-back-edge(y,  t,  g)} 
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Statement  (1)  in  turn  simplifies  to 


(2)    exists  [t,  n]  in  depth-first-trees (g,  r)  , 

X  in  nodes (g)  |  target-back-edge (x,  t,  g) 


Since  the  domain  of  the  map  n  is  nodes (g),  we  can  apply   the 
transformation 


1=  f-inv:smap 
(exists  X  in  dom  f  |  FM)    =>   (exists  y  in  range  f  | 

FM(x\f-inv(y))) 
X  :=  f-inv(y); 


to  (2)  to  obtain 


(3)     if  exists  [t,  n]  in  depth-first-trees (g,  r)  , 

y  in  range  n  |  target-back-edge(n-inv(y) ,  t,  g)  then 
X  : =  n-inv(y) ; 


The   range   of  n   is    the   set    {i:l    <=   i    <=   //nodes  (g)}. 


so  we   can   replace    (3)    by   the   code; 


(4)    if  exists    [t,    n]    in  depth-first-trees (g,    r)    , 

//nodes  (g)    >=  y    >=    1    |    target-back-edge  (n-inv(y) ,    t,    g) 

then 
X    : =  n-inv(y ) ; 


Note  that  tliis  iterator  chooses  the  maximal  y  in  the 
iterator  range.  Therefore,  the  assertion  below  is  available 
after  (4). 


(5)  |-  y  =  max  /  {y  :  y  in  range  n  | 

target-back-edge (n-inv(y) ,  t,  g) } 
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From  (5),  we  conclude, 
(6)   n(x)  =  max  /  n [targ-back-edges (g,  t)] 

Having  selected  x  in  this  manner,  we  know  using  Lemma  4  that 
reachunder (g,  t,  x)  +  {x}  is  a  strong  interval.  We  next  modify 
praa  (7)  in  5.3.3  so  that  g  is  collapsed  using  this  interval  as 
follows.   We  make  explicit  the  assignment  statement 

(7)  gl  :=  arb  {gl  |  collapsed(g,  gl)}; 

after  (31  and  move  it  to  the  place  before  @2.   We  then  use  block 
substitution  to  replace  (7)  by  the  code: 

i  :=  reachunder (g,  t,  x)  +  {x}; 
gl  :=  collapse(g,  i,  x) ; 

Before  (?2,  the  assertion  r  ~in  reachunder(g,  t,  x)  is   available, 

so  Lemma  A  can  be  applied  to  verify  this  block  substitution.   We 

now  have  the  following  praa. 

(8)  proc  testreduce(g)  returns  reduceflag  ; 

1=  flowgraph(g,  r)  &  r  ~in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

if  exists  [t,  n]  in  depth- first-trees(g,  r)  , 
//nodes(g)  >=  y  >=  1  |  target-back-edge(n-inv(y) ,  t,  g)  then 
X  : =  n-inv(y) ; 
if  E  gl  I  collapsed(g,  gl)  then 

if  r  in  reachunder (g,  t,  x)  then 

return  false; 
else 

i  :=  reachunder (g,  t,  x)  +  {x}; 
gl  :=  collapseCg,  i,  x) ; 
return  testreduce(gl) ; 
end; 
else 

return  false; 
end  ; 
else 

return  true; 
end; 
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|-  reduceflag  <->  reducible(g,  r) 
end; 

We  can  simplify  (8)  by  noting  that  r  "in  reachunder  implies 

E  gl  I  collapsed(g,  gl) 
and  using  the  following  derived  transformation  rule. 


if  FMl  then 

1=   ""FMl  ->  FT'12 

if  FM2 

then 

if  FM2  then 

SBl 

=> 

SBl 

else 

else 

SB2 

SB2 

end; 

end; 

else 

SBl 

end; 

This  gives  the  following  praa. 


(9)    proc  testreduce (g)  returns  reduceflag  ; 

1=  flowgraph(g,  r)  &  r  ~in  range  g 
&  (A  e  in  g  |  e(l)  ~=  e(2)) 

if  exists  [t,  n]  in  depth-first-trees (g,  r)  , 
//nodes(g)  >=  y  >=  1  |  target-back-edge  (n-inv(y) ,  t,  g)  then 
X  : =  n-inv(y) ; 


@0 


else 
end; 


1=  r  "in  reachunder (g,  t,  x)  -> 
E  gl  I  collapsed(g,  gl) 

if  r  in  reachunder (g,  t,  x)  then 

return  false; 
else 

i  :=  reachunder (g,  t,  x)  +  {x}; 

gl  :=  collapse(g,  i,  x) ; 

return  testreduce (gl ) ; 
end; 

return  true; 


|-   reduceflag    <->    reducible(g,    r) 
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end; 


The  assumption  introduced  at  @0  can  be  verified  from  Lemma 
4. 

Finally,  we  obtain  the  following  iterative  praa  from  (9)  by 
applying  tail  recursion  removal,  code  motion,  and  while  loop 
formation. 

(10)    proc  testreduce (g)  returns  reduceflag  ; 

1=  flowgraph(g,  r)  &  r  ~in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 

(while  exists  [t,  n]  in  depth-first-trees(g,  r)  , 
//nodes  (g)  >=  y  >=  1  |  target-back-edge  (n-inv(y) ,  t,  g) ) 
X  : =  n-inv(y) ; 
if  r  in  reachunder (g,  t,  x)  then 

return  false; 
else 
@0  i  :=  reachunder(g,  t,  x)  +  {x}; 

g  :=  collapse(g,  i,  x) ; 
end; 
end; 
return  true; 

|-  reduceflag  <->  reducible(gO,  r) 

end; 
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5.1.5  Step  II  -  Method  for  Calculating  Reachunder 

As  a  next  step,  we  will  supply  a  transitive  closure 
procedure  to  compute  the  set  reachunder (g,  t,  x)  +  {x}. 

We  begin  with  the  version  of  transitive  closure  given  below. 

(1)    1=  r:  map(v)  v  &  sO  subset  v 

new  :=  sO; 
res  :=  {}; 
(while   new  *"={}) 

w    from   new; 

res    :=   res   with  w; 

new    :=   new  +    (r{w}    -  res); 
end; 

|-   res   =    {x:x   in  v    |    (E    z   in  sO    | 
haspath(r,    z,    x)    )    } 

To  adapt  (1)  to  compute  reachunder,  we  substitute  the  set 
nodes(g)  for  v,  {z  |  [z,  x]  in  g  &  backedge([z,  x] ,  t)}  for  sO, 
and   the   expression 

{[n2,nl]    :     [n2,nl]    in  g-inv    |    nl    "=  x} 

for  r  to  obtain  the  following  praa: 

(2)     1=   {  [n2,nl] :  [n2,nl]    in   g-inv|nl~=x}    :    map(nodes(g) )nodes(g) 
&    {z    I     [z,    x]    in  g   &  backedge([z,    x] ,    t)}   subset   nodes(g) 

new    :=    {z    I     [z,    x]    in  g   &  backedge([z,    x] ,    t)}; 
@1  res    :=    {}; 

(while   new   ""=   {}) 
w    from  new; 
@2  res    :=   res   with  w; 

@3  new    :=   new  +    ({nl    in   g-inv{w} |    nl    ~=  x}   -   res); 

end; 

|-  res  =  reachunder (g,  t,  x) 
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We  can  simplify  (2)  slightly  by  modifying  it  to  compute   the 
set: 

reachunder (g,  t,  x)  +  {x} 
instead  of  reachunder (g,  t,  x).   We  proceed  in  the  following  way. 
The  expression 

{xl  :  xl  in  g-inv{w)  |  xl  "=  x} 
is  equivalent  to 

g-inv{w}  -  {x} 
and  so  we  replace  the  statement  at  (?3  by 

(3)    new  :=  new  +   (g-inv{w}  -  (res  +  {x})); 
We  next  introduce  a  variable  reach,  and  insert  the  assignment 

reach  :=  {x}; 
after  @1  and 

reach  :=  reach  with  w; 
after  (32.   The  assertion 

|-   reach  =  res  +  {x} 
is  then  available   everywhere.    We   can   therefore   replace   the 
output  assertion  by 

|-   reach  =  reachunder(g,  t,  x)  +  {x} 
and  replace  the  expression  (res  +  {x})  in  (3)  by  reach.   Finally, 
we   eliminate   the   variable   res   using   the  deletion  rule.   The 
result  is  the  praa: 


(4)     1=    {  [n2,nl] :  [n2,nl]    in   g-inv|nl~=x}    :    map (nodes (g) )nodes (g) 
&   {z    I     [z,    x]    in  g   &  backedge([z,    x] ,    t)}   subset   nodes(g) 

new    :=    {z    |     [z,    x]    in   g   &  backedge ( [z ,    x] ,    t)}; 
reach    :=   {x}; 
(while   new  ""={}) 

w    from  new; 

reach  :=  reach  with  w; 
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new  :=  new  +  (g-inv{w}  -  reach); 
end; 

|-  reach  =  reachunder(g,  t ,  x)  +  {x} 

Let  us  now  return  to  (10)  in  step  I,  insert  the   code   block 
(4)  before  place  (30,  and  replace  the  expression 

r  in  reachunder (g,  t,  x) 
by  (r  in  reach).   This  transformation  introduces  the  assumption 

1=  r  in  reach   <->  r  in  reachunder(g,  t,  x) 
which  can  be  verified  using  the  input  assumption: 

r  ~in  range  g 
We  also  replace  the  expression 

reachunder (g,  t,  x)  +  {x} 
at  (32  by  reach.   The  resulting  version  of  the  algorithm  is   given 
below. 


(5)    proc  testreduce(g)  returns  reduceflag; 

1=  flowgraphCg,  r)  &  r  "in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 
(while  exists  [t,  n]  in  depth-first-trees (g,  r)  , 
#nodes(g)  >=  y  >=  1  |  target-back-edge (n-inv(y) ,  t,  g) ) 
x  : =  n-inv(y) ; 

(31       1=   {  [n2,nl]  :  [n2,nl]    in   g-inv    |    nl    ~=  x}    : 

map(nodes(g) )    nodes(g) 
6.    {z    I     [z,    x]    in   g    &   backedge([z,    x]  ,    t)}    subset    nodes(g) 

new    :=    {z    |     [z,    x]    in  g   &  backedge([z,    x] ,    t)}; 
reach    :=   {x}; 
(while    new   ~=    {}) 

w    from  new; 

reach  :=  reach  with  w; 

new  :=  new  +  (g-inv{w}  -  reach); 
end; 

if  r  in  reach  then 
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return  false; 
else 

g  :=  collapse(g,  reach,  x) ; 
end; 
end; 
return  true; 

|-  reduceflag  <->  reducible (gO,  r) 

end; 


Notice  that  the  assumption  at  @1,  which  is  the  input 
assumption  of  (A),  must  still  be  verified.  This  assumption 
follows  from  the  definitions. 
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5.1.6   Step  III  -  Elimination  of  Iterative  Calculation 
of  Spanning  Tree 


In  the  praa  (5)  above,  a  depth  first  spanning  tree  for  g  is 
computed  at  each  iteration  of  the  outer  loop.  In  the  step  which 
now  follows,  we  transform  the  algorithm  so  that  a  depth  first 
spanning  tree  is  constructed  only  once.  We  divide  this  step  into 
two  parts.  First  we  formally  differentiate  ( [Pa79] ,  [Sh79] )  the 
expression 

(1)  arb  depth-first-trees(g,  r) 
with  respect  to  the  collapsing  of  g.  That  is,  given  that  g  will 
be  updated  by  this  specific  modification,  we  show  a  related 
updating  of  t  and  n  produces  a  new  depth  first  spanning  tree  of 
g.  Note  that  collapsing  a  graph  is  an  update  operation,  since  we 
can  rewrite  the  definition  (D12)  of  collapse  as 

collapse(g,  i,  x)  = 

g  -  {[u,  v]  :  [u,  v]  in  g  |  u  in  i}  + 

{  [x,  v]  :  [w,  v]  in  g  |  w  in  i  &  v  ""in  i} 

However,   if  we   expand   (1)   using   the    relevant    foregoing 

definitions,   we   obtain  an  expression  which  is  too  complex  to  be 

formally   differentiated.   This   is  because  we   are    formally 

differentiating   more   intricate   data  structures  than  the  simple 

maps  and  sets  handled  by  Paige  and   Sharir.    V;e   therefore   must 

prove   directly   that   the   transformation   that   we  will  use  is 

correct.   After  this,  we  can  show  that  the  update   code   supplied 

by  the  above  transformation  is  in  fact  not  necessary,  and  so  that 

we  can  delete  it. 
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We  begin  by  specifying  update  code  for  the  spanning  tree  t 
and  node  map  n. 

Suppose  i  is  a  strong  interval  with  head  x,  and  gl  is  the 
graph  obtained  by  collapsing  g  by  this  interval;   that  is 

gl  =  collapse(g,  i,  x) . 
Then  we  can  collapse  t  in  the  same  way  to  obtain  tl;   that  is, 

tl  =  collapse(t,  i,  x) . 
To  obtain  a  new  depth  first  spanning  tree  for  gl,   we  must   also 
update   the   node  map  n.   The  new  node  map  nl  will  agree  with  the 
old  node  map  for  all  nodes   z   such   that   n(z)   <=  n(x).   More 
precisely,  we  will  define  nl  as  follows: 

(i)  for  z  in  nodes(t)  such  that  n(z)  <=  n(x),  nl(z)  =  n(z); 

(ii)  for  z  in  nodes (t)  such  that  z  is  a  descendant  of  x, 
nl(z)  is  n(z)  less  the  number  of  nodes  in  i  -  {x}  which  have  node 
numbers  less  than  z,  and 

(iii)  for  z  in  nodes(t)  such  that  n(z)  >  n(x)  but  z  is  not  a 
descendant  of  x,  nl(z)  is  the  n(z)  less  the  number  of  nodes  in  i 
-  {x}.   More  formally,  we  define 


collapse-map (t ,  n,  i,  x)  = 

{[z,  n(z)]  :  z  in  nodes(t)  |  n(z)  <=  n(x)}   + 
{[z,  n(z)  -  #{w:w  in  i|n(w)  <  n(x)}  +  1]  :  z  in  nodes(t)  -  i  | 

haspath(t,    x,    z)}      + 
<[z,    n(z)    -   //i   +    1]     :    z    in   nodes(t)     |    n(z)    >  n(x)    & 

~haspath(t,    x,    z)} 

We  then  make  the  following  definition  of  a   collapsed   tree  and 
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corresponding  collapsed  node  map: 

collapse-tree(t ,  n,  i,  x)  = 

[collapse(t,  i,  x) ,  collapse-map (t ,  n,  i,  x) ] 

In  the  next  section  of  this  chapter  we  prove  the  following   lemma 

which   states   that   collapse-tree(t ,   n,   i,  x)  is  a  depth  first 

spanning  tree  pair  for  gl. 

LEMMA  5. 

i  =  reachunder (g,  t,  x)  +  {x}  &  r  ~in  i  & 

gl  =  collapse(g,  i,  x)  &  [t,  n]  in  depth- first-trees(g,  r) 

&  n(x)  =  max  /  targ-back.-edges(t ,  g) 

&  [tl,  nl]  =  collapse-tree(t ,  n,  i,  x) 

-> 
[tl,  nl]  in  depth- first-trees(gl ,  r) 

Therefore,  when  g  is  collapsed,  t  and  n  can  be   maintained   as   a 

depth   first   spanning   tree   of  g  by  inserting  the  the  following 

update  code: 

[tl,  nl]  :=  collapse-tree(t ,  n,  reach,  x) ; 
We  prepare  for  the  major  transformation  outlined  above  by   first 
making   the  implicit  assignments  to  t,  n,  and  y  in  praa  (5)  above 
exlicit.   That  is,  we  expand  the  while  loop  header  to  produce  the 
following  code: 

(1)   (while  E  [t,  n]  in  depth- first-trees(g,  r) , 
//nodes(g)  >=  y  >=  1  |  target-back-edge (n-inv(y ) ,  t,  g) ) 
@0      [t,  n]  :=  arb  depth-first-trees (g,  r) ; 
y  :=  max  /  {k:l  <=  k  <=  //nodes  (g)  | 

target-back-edge (n-inv(k) ,  t,  g)}; 
x  : =  n-inv(y) ; 

We  now  apply  Lemma  5  to  differentiate  the  expression  at  (30.   This 
results  in  the  following  praa: 

(2)    proc  testreduce(g)  returns  reduceflag; 
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1=  flowgraph(g,  r)  &  r  "in  range  g  & 
(A  e  in  g  I  e(l)  "=  e(2)) 

gO    :=  g; 

[tl,    nl]     :=  arb   depth-first-trees(g,    r) ; 

{?0  (while   E    [t,    n]    in   depth- first-trees  (g,    r)    , 

//nodes  (g)    >=  y    >=    1    |    target-back-edge(n-inv(y) ,    t,    g) ) 

[t,   n]       :=    [tl,    nl] ; 
y    :=  max   /    {k:l    <=  k   <=   //nodes (g)    | 

target-back-edge (n-inv(k) ,  t,  g)}; 
X  : =  n-inv(y) ; 

new  :=  {z  I  [z,  x]  in  g  &  backedge([z,  x] ,  t)}; 
reach  :=  {x}; 

(while  new  ""={}) 

w  from  new; 

reach  :=  reach  with  w; 

new  :=  new  4  (g-inv{w}  -  reach); 
end; 

if  r  in  reach  then 

return  false; 
else 

g  :=  collapse(g,  reach,  x) ; 
end; 

@I      [tl,  nl]  :=  collapse-tree(t ,  n,  reach,  x) ; 

end; 

return  true; 

|-  reduceflag  <->  reducible(gO,  r) 

end; 
Next  we  make   the   following  observation  which   allows   us   to 
simplify  the  while  loop  test  at  @0  in  (2). 

E  [t,  n]  in  depth- first-trees(g,  r)  ,  x  | 

target-back-edge (x,  t,  g) 
<-> 
A  [t,  n]  in  depth-f irst-trees (g,  r)  | 

E  x  I  target-back-edge(x,  t,  g) 

This  remark  follows  immediately  from  Lemma  2,  and   allows   us   to 

rewrite  the  loop  header,  putting  it  as 

(3)  (while    E    //nodes  (g)    >=   y    >=    1     | 
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target-back-edge(n-inv(y) ,  t,  g) ) 
As  an  additional  cleanup  step,  we  notice  that  the  assignment 

[t,  n]  :=  [tl,  nl] ; 
can  be  deleted,  if  we  replace  all  other  occurrences  of  tl  and  nl 
by   t   and  n.   After  this,  we  can  change  the  outer  while  E  header 
(3)  back  into  a  while  exists  header  which   implicitly   assigns   a 
value  to  y.   The  result  is  the  header: 

(4)   (while  exists  //nodes  (g)  >=  y  >=  1  | 

target-back-edge(n-inv(y) ,  t,  g) ) 
X  : =  n-inv(y ) ; 

This  completes  the  first  part  of  our  transformation. 

We  next  claim  that  the  code  which  updates  tl  and  nl  at  @2  in 
(2)  is  not  needed,  since  future  references  to  t  and  n  access 
parts  of  the  tree  which  are  not  altered  by  the  updating  of  tl  and 
nl.  We  can  therefore  use  the  assignment  deletion  rule.  The 
variables  t  and  n  are  not  dead  at  @1,  so  the  system  will  respond 
as  follows.  The  system  generates  temporary  variables  t'  and  n', 
to  store  the  original  values  of  t  and  n,  and  introduces  enabling 
assumptions  into  the  text,  to  protect  the  exposed  uses  of  t  and 
n.  Occurrences  of  t  and  n  in  (7)  are  changed  to  t'  and  n'.  We 
will  eliminate  these  tagged  variables  once  we  have  verified  the 
enabling  assumptions. 

The  deletion  rule  gives  the  following  praa. 

(5)    proc  testreduce (g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ~in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 
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gO    :=  g; 

[t,    n]     :=  arb   depth-first-trees(g,    r) ; 

@0  [t',    n']    :=    [t,    n]; 

(31       (1=      E    //nodes  (g)    >=   y    >=    1     | 

target-back-edge(n-inv(y) ,    t,    g) 
<-> 
E   //nodes(g)    >=  y    >=    1    |    target-back-edge(n'-inv(y ) ,    t',    g) 

& 
max   /    {y:l    <=  y   <=   //nodes  (g)    | 

target-back-edge (n-inv(y) ,    t,    g) }   = 

max   /    {y:l    <=  y   <=   //nodes (g)     | 

target-back-edge (n'-inv(y) ,    t,    g)} 

while   exists    //nodes  (g)    >=  y    >=   1    | 

target-back-edge(n-inv(y) ,    t,    g) ) 

(32  1=  n-inv(y)    =  n'-inv(y) 

X   : =  n-inv(y) ; 

(§3  1=   {z    I     [z,    x]    in  g   &  backedge([z,    x]  ,    t)}      = 

{z    I     [z,    x]    in  g   &  backedge([z,    x] ,    t')} 

new    :=    {z    I     [z,    x]    in   g   &  backedge([z,    x] ,    t)}; 
reach    :=   {x}; 
(while    new   ~=   {}) 

w    from  new; 

reach  :=  reach  with  w; 

new  :=  new  +  (g-inv{w}  -  reach); 
end; 

if  r  in  reach  then 

return  false; 
else 

g  :=  collapse(g,  reach,  x) ; 
end; 

@4  t'    :=  collapse(t',    reach,    x) ; 

n'    :=   {[y,k]     :     [y,k]    in  n'    |    1    <=  k   &   k   <=  n'(x)}   + 
{[y,   k-//reach+l]    :     [y,    k]    in  n'    |    k   >=  n'(x)   +  //i}; 

end; 

return    true; 

I-  reduceflag  <->  reducible (gO,  r) 

end; 
In  this  version,  variables  t'  and  n'  are  initialized  at   @0,   and 
enabling   assumptions   have  been  inserted  at  @1,  @2,  and  (33.   The 
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statements  at  (?4  will  be  deleted  once  the  assumptions  are 
verified.  These  assumptions  will  be  verified  in  full  detail  in 
the  next  section  of  this  chapter;  their  proof  follows  from  the 
lemma  below. 

LEMI-IA  6. 

i  =  reachunder (g,  t,  x)  +  {x}   &  r  ~in  i 

&  gl  =  collapse(g,  i,  x) 
&  [t,  n]  in  depth-first-trees(g,  r)   & 
[tl,  nl]  =  collapse-tree(t ,  n,  i,  x) 
&  n(x)  =  max  /  {k:l  <=  k  <=  #nodes(g)  | 
target-back-edge (n-inv(k) ,  t,  g) } 

-> 
(i)  targ-back-edges(t 1,  gl)  =  targ-back-edges (t ,  g)  -  {x}. 

[That  is,  no  new  back  edge  targets  are  introduced  by  the  collapse 

operation  and  x  is  no  longer  a  back  edge  target  in  gl.] 

(ii)  {e:e  in  gl  |  backedge (e, tl) }  =  {e:e  in  gl  |  backedge(e, t) } 
[That  is,  If  e  is  an  edge  in  the  collapsed  graph  gl,  and  e  is  a 
backedge  with  respect  to  the  collapsed  tree  tl,  then  e  is  a 
backedge  with  respect  to  the  original  tree  t.  Note  that  even 
though  t  is  not  a  depth  first  spanning  tree  of  the  graph  gl,  but 
of  g,  the  predicate  backedge(e,  t)  makes  no  reference  to  a 
specific  graph  g  or  gl,  so  that  (ii)  makes  sense  with  respect  to 
our  definitions.] 

Since  the  proof  of  the  assumptions  in  (5)  are  far  from 
obvious,  we  give  an  intuitive  proof  of  the  most  difficult 
assumption  at  @1,  given  Lemma  6.  First  of  all,  the  assumption  is 
true  when  the  loop  is  first  entered  since  t  =  t',  and  n  =  n',  so 
we  must  show  that  the  assumption  is  loop  invariant.   We  therefore 
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assume  that  (?1  is  true,  and  show  that  @1  remains  true  for  the  new 
values  after  g,  t,  and  n  are  updated.  Before  the  loop  is 
executed,  the  values  of  g,  t,  and  n  correspond  to  g,  t,  and  n  in 
the  hypothesis  of  Lemma  6,  and  after  the  loop  is  executed,  the 
new  values  correspond  to  gl,  tl,  and  nl  in  Lemma  6.  Now,  we  have 
chosen  x  to  be  the  back  edge  target  with  the  highest  node  number. 
Therefore,  if  there  is  a  back  edge  target,  say  xl,  in  the 
collapsed  graph  gl,  then  by  (i)  above,  n(xl)  <=  n(x).  Since  the 
node  map  nl  agrees  with  the  node  map  n  for  all  nodes  in  the  range 
1  <=  k  <=  n(x),  we  conclude  that  nl(xl)  =  n(xl).  Therefore,  we 
can  apply  (ii)  above  and  conclude  that  xl  is  a  back  edge  target 
iff  it  is  a  back  edge  target  in  the  collapsed  graph  gl  with 
respect  to  the  tree  t,  and  by  inductive  hypothesis,  with  respect 
to  the  original  tree  t'. 

Once  these  assumptions  are  verified,  we  can  delete  the 
assignments  to  the  variables  t'  and  n'.  This  leaves  us  with  the 
praa  below. 

(6)    proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  "in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 

[t,  n]  :=  arb  depth- first-trees(g,  r); 

(while  exists  #nodes(g)  >=  y  >=  1  | 

target-back-edge (n-inv(y) ,  t,  g) ) 

X    : =  n-inv(y) ; 
new    :=    {z    |     [z,    x]    in  g   &  backedge([z,    x] ,    t)}; 
reach    :=   {x}; 

(while   new   ~=   {}) 
w    from  new; 
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reach  :=  reach  with  w; 
new  :=  new  +  (g-inv{w}  -  reach); 
end; 

if  r  in  reach  then 

return  false; 
else 

g  :=  collapse(g,  reach,  x) ; 
end; 
end; 
return  true; 

|-  reduce  flag  <->  reducible (gO,  r) 

end; 


Next  we  make  the  following  observation.  I f  an  edge  [v,  x] 
in  g  is  a  backedge,  then  v  is  clearly  in  reachunder (g,  t,  x). 
Therefore,  after  g  is  collapsed,  v  is  not  in  the  collapsed  graph, 
so  [v,  x]  is  no  longer  a  backedge.  Furthermore,  according  to 
Lemma  6,  no  new  backedges  are  introduced.  Therefore,  x  is  not  a 
backedge  target  in  the  collapsed  graph.  We  next  observe  that  the 
nodes  of  a  collapsed  graph  form  a  subset  of  the  nodes  of  the 
original  graph.  Therefore,  we  can  convert  the  outer  while  loop 
into  a  forall-loop  which  iterates  over  the  nodes  of  gO.  This 
uses  the  following  transformation  rule. 


(while  exists  EXPl  >=  x  >=  EXP2  |  FM)  |=    EXPl  =  EXP 

SB              =>            (forall  EXP  >=  x  >=  EXP2  i  FM) 

end;  SB 

1=  (A  EXPl  >=  i  >=  X  I  -FMCxXi))  & 

(A  X  >=  i  >=  EXPl  I  ^Ft-ICxNi)) 
end ; 

where  EXP  and  EXP2  are  expressions  which  must  be  constant  in   the 

loop. 


The   expression  EXPl    corresponds    to    //nodes  (g)      in      our      praa. 
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and  we   replace   it   by  the  loop  constant  expression  //nodes (gO), 
which  yields  the  following  praa: 

(7)    proc  testreduce (g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  "in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO   :=  g; 

[t,    n]     :=  arb   depth- first-trees (g,    r); 

(?0      1=   #nodes(g)    =   //nodes  (gO) 

(forall  //nodes(gO)  >=y>=l  |  target-back-edge(n-inv(y) ,  t,  g) ) 

X  : =  a-inv(y ) ; 
@3       new  :=  {z  I  [z,  x]  in  g  &  backedge([z,  x] ,  t)}; 
reach  :=  {x}; 

(while  new  -=  {}) 
w  from  new; 

reach  :=  reach  with  w; 
new  :=  new  +  (g-inv{w}  -  reach); 
end ; 

@1      if  r  in  reach  then 

return  false; 
else 

g  :=  collapse(g,  reach,  x) ; 
end; 

(32      1=    (A   //nodes  (g)    >=   i    >=  y    | 

"target-back-edgeCn-invCi) ,    t,    g) )    & 
(A  y   >=   i    >=   //nodes(g)     |    "target-back-edgeCn-invCi) ,    t,    g)) 

end; 

return   true; 

|-   reduceflag  <->  reducible (gO,  r) 

end  ; 

The  assumption  at  (30  is  easily  verified.   The  assumption  at 
(32  follows  from  Lemma  6,  since  x  "in  targ-back-edges(t  1,  gl). 

We  also  know  from  Lemma  6  that 
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A  y  >=  k  >=  1  I  (target-back-edge (n-inv(y) ,  t,  g)  <-> 
target-back-edge(n-inv(y) ,  t,  gO) ) 

We  therefore  can  replace  g  by  gO  in  the  forall-loop  header. 
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5.1.7  Step  IV  -  Introduction  of  the  Union-Find  Mechanism 

In  this  step  we  reformulate  the  praa  to  use  the  union-find 
mechanisms  described  by  Hopcroft,  Ullman,  and  Tarjan  [AHU74] , 
[Ta75] .  This  is  appropriate  since  collapsing  the  graph 
corresponds  to  a  union  operation,  and  obtaining  a  node  in  a 
collapsed  graph  corresponds  to  a  find  operation.  These 
operations  are  performed  on  disjoint  subsets  s  of  nodes  of  the 
original  graph  gO,  which  have  the  following  properties:  (i)  each 
s  is  strongly  connected,  and  (ii)  each  s  has  a  unique  entry  point 
X.  (This  entry  point  x  corresponds  to  a  node  in  the  current 
collapsed  graph.  Note  that  the  subsets  s  with  which  we  work  are 
not  always  strongly-connected  intervals,  since  there  may  be  a 
cycle  in  s  which  does  not  contain  x.)  Initially,  gO  is  divided 
into  subsets  consisting  of  a  single  node  each.  Whenever  a 
strongly  connected  interval  is  found,  the  subsets  containing  the 
nodes  of  that  interval  are  merged. 

To  realize  this  reformulation,  we  introduce  a  map  head,  such 
that  nodes  u  and  v  are  in  the  same  strongly  connected  set  s  iff 

head-lim(u)  =  head-lim(v) 
Head  is  initially  empty.  Then,  whenever  an  interval  i  with  head 
X  is  found,  head(y)  is  set  to  x,  for  all  nodes  y  in  i,  thereby 
merging  subsets  At  each  iteration  of  the  reformulated  algorithm, 
the  nodes  of  the  current  collapsed  graph  are  the  limit  nodes  of 
the  head  map.   Therefore,  the  map  head  will  replace  the  variable 
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To  achieve  the  desired  reformulation,  we   first   insert   the 
statement: 

head  :=  {}; 
before  @0  in  the  last  version,  and  the  statement 

(1)  head  :=  head  +  {[z,  x]  :  z  in  reach  -  {x}}; 

before  @1.   After  this  is  done,   head  will   have   the   following 
properties. 

(2)  head  :  smap(nodes (gO) )  nodes (gO) 

(3)  nocycles (head) 

To  prove  these  properties,  we  note  that  the  assertion 
|-  nodes (g)  *  dora  head  =  {} 
is  provable  after  g  is  collapsed,  since   the   collapse   operation 
deletes   precisely   those   nodes   for  which   head   is   defined. 
Therefore,  whenever  the  statement  (1)  is  executed,  we  know  that  y 
""in  dom  head  and  x  "in  dom  head,  so  that  no  cycles  can  be  formed. 

The  next  lemma  tells  us  that  given  (1)  and  (2),   head-lira(z) 
is  defined  for  all  nodes  z  in  gO. 

LEMMA  7. 

f  :  smap(s)  s  & 
nocycles(f)  -> 
(A  X  in  s  I  E  y  in  s   |  f-lim(x)  =  y  ) 

After  the  insertions  described  above,  we  replace  references  to   g 

by   references   to  head.   We  claim  that  the  following  equality  is 

true  at  the  head  of  the  A  loop. 
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(4)    g  =  { [head-lim(u) ,  v]  :  [u,  v]  in  gO  |  v  ~in  dom(head) 

&  head-lim(u)  "=  v} 

Statement    (4)     follows    from    the    next    lemma,    which   we   prove    later. 


LEItMA   8. 

head  :  smap (nodes (gO) )  nodes (gO)  &  nocycles (head)  & 

g  =  {  [head-lira(u) ,  v]  :  [u,  v]  in  gO  |  v  ""in  dom  head   & 

head-lim(u)  ~=  v}   & 
strong-interval (i,  g,  x)   &  gl  =  collapse(g,  i  x)   & 
dom  head  *  nodes (g)  =  {}   & 
headl  =  head  +  { [w,  x]  :  w  in  i  -  {x}} 

-> 
gl  =  { [headl-lim(u) ,  v]  :  [u,  v]  in  gO  |  v  ~in  dom  headl   & 

headl-lim(u)  "=  v} 

We  can  therefore  replace  the  remaining  uses  of  g  in  the   praa   by 
the  right  hand  side  of  (4).   First  consider  the  expression 

g-inv{y}  -  reach. 
This  becomes: 


{ [head-lim(u) ,  v]  :  [u,  v]  in  gO-inv  |  v  ~in  don(head) 
&  head-lim(u)  "=  v}{y}  -  reach 

which  is  equivalent  to 


{head-lira(u)  :  u  in  gO-inv{y}  |  y  ~in  dom(head)  & 
head-lim(u)  ""=  y}  -  reach 

Note  that  since  y  is  a  node  of  g,  y  is  not  in  the  domain  of  head. 

Moreover,  y  is  in  reach,  we  can  replace  the  above  expression  by 

{head-lim(u)  :  u  in  gO-inv{y}  }  -  reach. 
Next,  we  replace  g  at  (?3  to  obtain  the  expression: 


{z  I  [z,  x]  in  { [head-lim(u) ,  v]  :  [u,  v]  in  gO  | 
V  ~in  dom  head   &  head-lim(u)  ~=  v}  &  back.edge(  [z ,  x]  ,  t)} 

This  expression  is  equivalent  to 

{head-lim(u)  :  [u,  x]  in  gO  |  x  ""in  dom  head  & 

head-lira(u)  ~=  x  &  backedge ( [head-lim(u) ,  x] ,  t)} 
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This  can  be  further  simplified.  We  know  that  x  "in  dom  head. 
Suppose  head-lim(u)  =  x.  Then  x  was  a  back  edge  target  during  a 
previous  iteration,  which  is  impossible.  Therefore,  the  above 
expression  simplifies  to: 

{head-lim(u)  |  [u,  x]  in  gO  &  backedge ( [head-lira(u) ,  x] ,  t)}. 
We  prove  in  the  next  section  that 

backedge ( [head-lim(u) ,  x] ,  t)  <->  backedge ([u,  x] ,  t) 
leading  to  the  following   further   simplification   of   the   above 
expression: 

{head-lim(u)  |  [u,  x]  in  gO  &  backedge ([u,  x] ,  t)} 
Finally,  having  removed  all  references  to  g,  we   can  delete   the 
assignments  to  g,  yielding  the  following  praa: 

(4)    proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  "in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 

[t,  n]  :=  arb  depth- first-trees (g,  r)  ; 

head  :=  {}; 

(forall    //nodes (gO)    >=  y    >=   1    | 

target-back-edge (n-inv(y) ,    t,    gO)) 

x  : =  n-inv(y) ; 

new  :=  {head-lim(u)  :  [u,  x]  in  g  | 

backedge (  [u,  x] ,  t)}; 
reach  :=  {x}; 

(while  new  "=  { } ) 
y  from  new; 

reach  :=  reach  with  y; 
new  :=  new  +  ({head-lim(u)  :  u  in  gO-inv{y}}  - 
reach) ; 
end; 

head  :=  head  +  {[u,  x]  :  u  in  reach  -  {x}}; 
if  r  in  reach  then 
return  false; 
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end; 

end; 

return  true; 

|-   reduceflag  <->  reducible(gO,  r) 

end  ; 

As  an  additional  transformation,  we  observe  that  the  map 
head  can  be  updated  while  the  set  reach  is  being  formed.  That 
is,  whenever  an  element  w  is  added  to  reach,  we  can  set  head(w) 
to  X,  and  thereby  eliminate  the  statement  at  (?2.  This 
transformation  can  be  achieved  in  a  relatively  mechanical  way  by 
a  loop  fusion  transformation  as  described  by  Sharir  [Sh79] . 
Sharir's  technique  is  the  following.  Suppose  a  statement  S  with 
a  set  former,  which  is  itself  an  implicit  loop,  refers  to  a 
variable  whose  value  is  calculated  in  a  preceding  loop.  Then  S 
can  be  moved  into  the  loop,  and  the  set  expression  contained  in  S 
can  be  formally  differentiated.  In  our  example,  movement  of 
statement  02  into  the  while  loop  which  precedes  it  is  not  a 
simple  code  motion  transformation,  since  the  variable  head  is 
used  inside  the  loop.  However,  we  can  prove  that  the  value  of 
the  expression  at  (?1  is  not  changed  by  the  code  motion 
transformation.   For  this,  we  proceed  as  follows. 

Using  the  transformation  described  in  section  4.1.8,  we  move 
statement  @2  to  the  place  before  @1.  The  system  generates  a  new 
temporary  variable  head'  to  store  the  original  value  of  head,  and 
returns  the  following  praa: 
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(5)   proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  "in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 

[t,  n]  :=  arb  depth- first-trees (g,  r) ; 
head  :=  {}; 
(90  head'  :=  head; 

(forall  //nodes  (gO)  >=  y  >=  1  | 

target-back-edge(n-inv(y) ,  t,  gO)) 

X  : =  n-inv(y) ; 

new  :=  {head-lim(u)  :  [u,  x]  in  g  | 

backedge([u,  x] ,  t)}; 
reach  :=  {x}; 
(while  new  ~=  {}) 
w  from  new; 
reach  :=  reach  with  w; 

@1  head  :=  head  +  {[u,  x]  :  u  in  reach  -  {x}}; 

(§2         1=  {head-lim(u)  :  u  in  gO-inv{w}}  -  reach  = 
{head'-lira(u)  :  u  in  gO-inv{w}}  -  reach 

new  :=  new  +  ({head-liin(u)  :  u  in  gO-inv{w}} 
reach) ; 
end; 

@3       head'  :=  head'  +  {[u,  x]  :  u  in  reach  -  {x}}; 
if  r  in  reach  then 

return  false; 
end; 


end; 

return  true; 

|-     reduceflag   <->   reducible(gO,    r) 

end    ; 

The  variable  head'  is  initialized  at  @0.  At  (33,  head  has 
been  changed  to  head',  since  this  statement  will  be  deleted  once 
the  enabling  assumption  at  (§2  is  verified.  This  enabling 
assumption  assures   that   the  code  motion  transformation  has  not 
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affected  any  exposed  uses  of  head. 

It  is  easy  to  show  that 
head  =  head' 
is  true  at  the  head  of  the  forall-loop.   To  prove  the   assumption 
at  (32,  we  note  the  following. 

{head-liTn(z) :  z  in  gO-inv{y}}  -  reach  = 

{head-lim(z) :  z  in  gO-inv{y}  |  head'-lim(z)  ~in  reach }+ 
{head-lira(z ) :  z  in  gO-inv{y}  |  head'-lim(z)  in  reach}  - 

reach 

For  all  nodes  z  such  that  head'-lira(z)  =  v  is  in  reach,   we  know 

that  head(v)  =  x.   Therefore,  head-lim(z)  =  x,  and  we  can  replace 

the  above  expression  by: 


{head'-lim(z ) :  z  in  gO-inv{y}  | 
head'-lim(z)  ~in  reach}  +    {x}  -  reach 

Since  x  is  in  reach,  this  is  equivalent  to 


{head'-lim(z )  :  z  in  gO-inv{y}}  -  reach. 
This  proves  the  assumption  at  @2. 

The  temporary  variable  head'  can  now  be   eliminated.    Then, 
we  formally  differentiate  the  statement  at  (?1,  and  obtain 

head  :=  head  +  { [w,  x]}; 
The  final  version  of  the  praa  is  given  below. 

(6)  proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ~in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 
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[t,    n]     :=  arb   depth- first-trees (g,    r) ; 
head    :=   {}; 

(forall   //nodes  (gO)    >=  y    >=   1    | 

target-back-edge (n-inv(y) ,    t,    gO)) 

X  : =  n-inv(y) ; 

new  :=  {head-lim(u)  :  [u,  x]  in  g  | 

backedgeC [u,  x] ,  t)}; 
reach  :=  {x}; 
(while  new  ""={}) 
w  from  new; 

reach  :=  reach  with  w; 
head  :=  head  +  { [w,  x] } ; 

new  :=  new  +  ({head-lim(u)  :  u  in  gO-inv{w}}  - 
reach) ; 
end; 

if  r  in  reach  then 

return  false; 
end; 

end; 

return  true; 

|-   reduceflag  <->  reducible(gO,  r) 

end  ; 
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5.1.8  Step  V  -  Final  Optimizations 

The  remaining  transformations  performed  in  this  section 
include 

1.)  implementation  of  the  head  map  by  means  of  a  compressed 
balanced  tree  representation; 

2.)  supply  of  a  procedure  for  computing  the  depth  first 
spanning  tree  of  a  flow  graph; 

3.)  motion  of  constant  expressions  out  of  inner  loops. 

These  transformations  are  all  quite  straightforward,  and 
require  relatively  little  insight  on  the  part  of  the  system  user. 
For  example,  code  motion  of  constant  expressions  are  well  within 
the  range  of  current  optimization  techniques.  Moreover, 
recognizing  that  compressed  balanced  trees  can  be  used  profitably 
is  not  difficult,  since,  in  the  last  section,  we  have  carefully 
exposed  union-find  operations,  performed  on  the  map,  head. 
However,  the  realization  that  the  union  find  mechanism  should  be 
used  is  a  deep  insight,  well  beyond  what  we  would  expect  from  an 
automatic  system. 

To  implement  head  as  a  compressed  balanced  tree,  we  use  the 
date  structure  transformation  described  in  section  A. 2. 1.3. 
Before  inserting  calls  to  findlim  and  balance  into  our   praa,   we 


Case  Study:  Tarjan's  Graph  Reducibility  Test  Algorithm  PAGE  5-46 

first   expand  the  statements  which  contain  references  to  head-lim 
as  follows: 


(1)  proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  "in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO    :=  g; 

[t,    n]    :=  arb  depth- first-trees(g,    r); 
(30        head    :=   {}; 

(forall   //nodes  (gO)    >=  y    >=   1    | 

target-back.-edge(n-inv(y) ,    t,    gO) ) 

X    : =  n-inv(y) ; 

new  :=  {}; 
(forall  V  in  nodes(gO)  |  [v,x]  in  gO  &  backedge((v,  x] ,  t)) 
(§1         h  :=  head-lim(v); 
new  :=  new  with  h; 
end; 

reach  :=  {x}; 
(while  new  ""={}) 

w  from  new; 
reach  :=  reach  with  w; 
@2  head(w)  :=  x; 

(forall  u  in  gO-inv{w}) 
@3  h  :=  head-lim(u) ; 

if  h  ~in  reach  then 

new  :=  new  with  h; 
end; 
end; 

end; 

if  r  in  reach  then 

return  false; 
end; 


end; 

return  true; 

|-   reduceflag  <->  reducible(gO,  r) 

end  ; 

In   the  praa   (1),   we   have   expanded   set   formers   containing 
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references   to  head-lim  into  explicit  loops.   This  is  necessary 
because  the  function  findlim  returns  a  pair  of  variables. 

Next,  to  incorporate  the  code  which  implements  the 
compressed  balanced  tree  representation  into  praa  (1),  we  replace 
the  variable  f  by  head,  and  s  by  nodes (gO).  It  is  easy  to  see 
that  the  enabling  conditions  for  applying  the  data  structure 
transformation  are  satisfied.  Indeed,  the  only  occurences  of 
head  are  of  the  form: 

var  :=  head-lim(z); 
where  z  is  in  the  set  nodes (gO),  and 

head(w)  :=  x; 
where  both  w  and  x  are  in  nodes (gO).  We  insert  initialization 
code  after  @0,  replace  statements  @1  and  03  by  calls  to  findlim, 
and  insert  a  call  to  balance  after  @2.  After  these  modifications 
are  made,  there  are  no  longer  any  uses  of  head,  so  we  can  delete 
assignments  to  this  variable.   We  obtain  the  praa: 

(2)  proc  testreduce(g)  returns  reduceflag; 

1=  flowgraphCg,  r)  &  r  "in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 
@0   [t,  n]  :=  arb  depth- first-trees (g,  r); 
f comp  : =  { } ; 

froot  :=  {[z,  z]  :  z  in  nodes (gO)}; 
count  :=  {[z,  1]:  z  in  nodes(gO)}; 

@2   (forall  //nodes  (gO)  >=  y  >=  1  | 

target-back-edge(n-inv(y) ,  t,  gO)) 

X  :=  n-inv(y) ; 
new  :=  {}; 
@1   (forall  V  in  nodes(gO)  |  [v,x]  in  gO  &  backedge( [v,x] , t) ) 
[fcomp,  h]  :=  findlim( fcomp,  v) ; 
new  :=  new  with  h; 
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end; 

reach    :=   {x}; 

(while    new   ~=   {}) 

w    froii  new, 

reach    :=   reach  with  w; 

[froot,    fcorap,    count]     :=  balance ( f root ,    fcomp, 

count ,  w,  x) ; 
(forall  u  in  gO-inv{w}) 

[fcorap,  h]  :=  findlim( fcomp,  u) ; 

if  h  "in  reach  then 

new  :=  new  with  h; 

end; 
end; 

end; 

if  r  in  reach  then 
return  false; 
end; 

end; 

return  true; 

|-   reduce  flag  <->  reducible (gO,  r) 

end  ; 

Next  we  perform  a  series  of  simple  optimizations.  First  of 
all,  now  that  g,  n,  and  t  are  loop  constants,  expressions 
depending  on  these  variables  can  be  moved  outside  the  loop.  More 
specifically  we  proceed  as  follows. 

(i)  The  map  n-inv  is  equivalent  to  the  expression 
{ [v,  u]  :  [u,  v]  in  n}. 
Introducing   the   new   variable   nodevect,    we   insert    the 
initialization  statement 

nodevect  :=  { [v,  u]  :  [u,  v]  in  n}; 
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after  @0  and  replace  all  uses  of  n-inv  by  the  variable  nodevect. 

(ii)  Next,  we  introduce  a  map  backinv  such  that  backinv{x} 
is  the  set  of  all  nodes  z  such  that  [z,  x]  is  a  backedge  in  gO. 
Then  the  iterator  head  at  @1  can  be  replaced  by  an  iteration  over 
the  nodes  in  backinv{x}.  For  this  we  insert  the  the 
initialization  code: 

(3)  backinv  :=  { [x,  v]  :  [v,  x]  in  gO  |  backedge([v,  x] ,  t)}; 
and  then  simplify  statement  (3)   by   noting   that   [v,   x]   is   a 
backedge   iff  v   is   a   descendant  of  x,  which,  according  to  the 
definition  of  tree-node-map,  is  equivalent  to: 

n(x)  <=  n(v)  <  n(x)  +  numdescsCt,  x) 
We  therefore  can  replace  (3)  by  the  statement: 

backinv  :=  {[w,  v]  :  [v,  w]  in  gO  | 

n(w)  <=  n(v)  &  n(v)  <  n(w)  +  numdescs(t,  w)}; 

(iii)  Next  we  replace  the  expression 

(4)  target-back-edge (nodevect (y) ,  t,  gO) 

at  @2.  It  is  clear  that  the  back  edge  targets  of  gO  with  respect 
to  t  is  the  domain  of  the  map  backinv.  We  therefore  insert  the 
initialization  statement: 

targetbackedges  :=  dora  backinv; 
and  replace  expression  (4)  by 

nodevect (y)  in  targbackedges. 

(iv)  Finally,  the  expression  nodes (gO)  is  a  program 
constant.    We   introduce   a  variable   nodeset,  initialize  it  to 
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nodes(gO)  at  the  beginning  of  the  program,  and  substitute  nodeset 
for  the  expression  nodes (gO). 

These  transformation  result  in  the  praa  version  shown  below. 

proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ~ln  range  g   & 
(A  e  In  g  I  e(l)  ~=  e(2)) 

gO  :=  g; 
@0   [t,  n]  :=  arb  depth-first-trees(g,  r); 

nodeset  :=  nodes(gO); 

nodevect  :=  { [v,  u]  :  [u,  v]  in  n}; 

backinv  :=  { [v,  u]  :  [u,  v]  in  g  |  n(v)  <=  n(u)   & 

n(u)  <  n(v)  +  numdescs(t,  v)}; 
targbackedges  :=  dom  backinv; 

f corap  :  =  {  } ; 

froot  :=  {[z,  z]    :  z  in  nodeset}; 

count  :=  {[z,  1]:  z  in  nodeset}; 

(forall  //nodeset  >=  y  >=  1  |  nodevect  (y)  In  targbackedges) 

X  : =  nodevect (y) ; 

new  :=  {}; 

(forall  V  in  backinv{x}) 

[fcorap,  h]  :=  f lndlim( f corap,  v) ; 

new  :=  new  with  h; 
end; 

reach  :=  {x}; 
(while  new  "=   {}) 
w  from  new; 
reach  :=  reach  with  w; 
[froot,  fcorap,  count]  :=  balance ( froot ,  fcorap, 

count,  w,  x) ; 
(forall  u  in  gO-inv{w}) 

[fcomp,  h]  :=  findlim(fcomp,  u) ; 
if  h  ""in  reach  then 

new  :=  new  with  h; 
end; 
end; 

end; 

if  r  in  reach  then 
return  false; 
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end; 

end; 

return  true; 

|-   reduceflag  <->  reducible (gO,  r) 

end  ; 
We  are  now  ready  to  supply  a  procedure  for  computing  the 
depth-first  spanning  tree  for  g,  and  eliminate  the 
nondeterministic  assignment  statement  at  @0.  The  depth  first 
spanning  tree  will  be  computed  by  an  off-line  function  dfst, 
which  also  computes  a  map  ndescs,  such  that  ndescs(x)  is  the 
number  of  descendants  of  x  in  t.  The  necessary  procedure  dfst  is 
as  follows: 


proc  dfst(gl)  returns  t2,  n2 ,  ndescs; 

1=  flowgraphCgl,  r) 

t2  :=  {}; 

n2  :=  {}; 

ndescs  :=  {[z,  1]  :  z  in  nodes(gl)}; 

[t2, 2, ndescs]  :=  treewalk(g, r, t2,  n2,  ndescs,  1)(1:3); 

return; 

I-  [t2,  n2]  in  depth- first-trees(gl ,  rl)  & 

A  z  in  nodes(t2)  |  ndescs(z)  =  numdescs(t2,  z) 

end; 

proc  treewalk(g,  rl,  tl,  nl ,  ndescsl,  cl) 

returns  t2,  n2 ,  ndescs2,  c2 ; 

1=  gl  =  g  1  dom  tl   &   [tl,  nl]  in  depth-first-trees(gl ,  r) 
&   cl  =  max  /  range  nl  +  1 
&   rl  in  range  tl   &   rl  "in  dom  tl   & 
A  X  in  nodes(tl)  |  ndescsl(z)  =  numdescs(tl,  z) 

[t2,  n2,  ndescs2]  :=  [tl,  nl,  ndescsl]; 

n2(rl)  :=  cl; 

c2  :=  cl  +  1; 

(while  exists  y  in  g{rl}  |  y  ~in  dom  n2) 

t2  :=  t2  +  [rl,  y] ; 

ndescs2(rl)  :=  ndescs2(rl)  +  1; 

[t2,  n2,  ndescs2,  c2]  := 


Case  Study:  Tarjan's  Graph  Reducibility  Test  Algorithm  PAGE  5-52 

treewalk(y,  t2,  n2 ,  ndescs2,  c2) ; 
end; 
return; 

I-   [t2,  n2]  in  depth- first-trees(g2 ,  r)   & 
g2  =  g  I  dom  t2   & 

g2  =  gl  +  {e  in  g  |  haspath(g,  rl,  e(l))} 
&  c2  =  max  /  range  n2  +  1  & 

A  z  in  nodes(t2)  ]  ndescs2(z)  =  nuradescs(t2,  z) 

end  treewalk; 

To  make  use  of  this  code,  we  introduce  a  new  variable  ndescs,  and 

replace  statement  @0  by  the  statement 

(4)      [t,  nodeno,  ndescs]  :=  dfst(g) ; 

To  justify  this  transformation,  we  must  verify  the  assumption 

1=  (  [tl,  nl]  in  depth-first-trees(g,  r)  & 
(A  z  in  nodes(tl)  |  ndo(z)  =  numdescs(tl,  z)  )    -> 
[tl,  nl]  in  depth- first-trees(g) 

which  is  trivial.   We  must  also  show  that  the  input  assumption  of 

the  function  is  satisfied;   that  is,  the  proposition 

flowgraph(g,  r) 

must  be  available  before   (4),   and   it   clearly   is.    Also   the 

assertion 

(A  ra  in  nodes(gO)  |  ndescs (m)  =  numdescs(t,  m) ) 
is  available  after  (4),  so  that  the  expression  numdescs(t,  v)   at 
(31   can  be   replaced  by  the  expression  ndescs  (v).   This  gives  a 
final  praa  version  shown  below. 

proc  testreduce(g)  returns  reduceflag; 

1=  flowgraph(g,  r)  &  r  ~in  range  g   & 
(A  e  in  g  I  e(I)  ~=  e(2)) 

gO  :=  g; 

[t,  n,  ndescs]  :=  dfst(g); 

nodeset  :=  nodes(gO); 

nodevect  :=  {[v,  u]  :  [u,  v]  in  n}; 

backinv  ;=  {[v,  u]  :  [u,  v]  in  g  |  n(v)  <=  n(u)   & 
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n(u)  <  n(v)  +  ndescs(v)}; 
targbackedges  :=  dom  backinv; 

f  corap  :  =  {  } ; 

froot  :=  {[z,  z]  :  z  in  nodeset}; 

count  :=  {[z,  1]:  z  in  nodeset}; 

(forall  //nodeset  >=  y  >=  1  |  nodevect(y)  in  targbackedges) 

X  : =  nodevect (y) ; 

new  : =  { } ; 

(forall  V  in  backinv{x}) 

[fcorap,  h]  :=  f indlim( fcomp,  v) ; 

new  :=  new  with  h; 
end; 

reach  :=  {x}; 
(while  new  ~=  {}) 
w  from  new; 

reach  :=  reach  with  w; 
[ froot , fcomp, count]  :=  balance ( froot ,  fcomp, 

count ,  w,  x) ; 
(forall  u  in  gO-inv{w}) 

[fcomp,  h]  :=  findlim( fcomp,  u); 
if  h  "in  reach  then 

new  :=  nev^  with  h; 
end; 
end; 

end; 

if  r    in    reach    then 
return    false; 

end; 
end; 
return   true; 

|-   reduceflag  <->  reducible(gO,  r) 
end  ; 
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5.2  Proof  of  Tarjan's  Algorithm 

In  this  section  we  give  a  detailed  proof  of  the  lemmas  which  are 
stated  in  section  5.1.  This  proof  is  close  to  the  level  of  an 
automatic  proof  checker  interactive  proof;  however,  it  is  still 
only  a  proof  scenario.  In  section  5.2.8  we  expand  the  proof  of 
several  lemmas,  filling  in  the  details  which  would  be  required  by 
an  automatic  proof  checker.  From  this  sample,  we  estimate  that 
the  proof  would  expand  by  a  factor  of  3  if  checked  automatically. 

We  first  state  some  general  theorems  which  are  used  in  the 
proofs  of  the  lemmas. 

Whenever  an  element  is  selected  from  a  set,  it  is  neccesary 
to  show  that  the  set  is  not  empty.  We  will  often  be  selecting  an 
element  from  a  non-empty  set  which  is,  for  example,  maximal  or 
first  In  a  list.  The  following  metarule  is  useful  for  verifying 
that  such  an  element  exists. 


(THl)   reflexive-total-order(r,  {y  |  fm})   & 
E  y  I  fm    -> 
E  y  I  (fm  &  A  z  |  fm(y  \  z)  ->  r(y,  z)  ) 


The  next  three  theorems  state  properties  of  one-one  maps. 

(TH2)   x  in  s  &  f(x)  >  f(y) 
&  f  :  bmap(s)  s 

-> 
//{z  In  s  I  f(z)  >  f(y)}  >  //{z  in  s  |  f(z)  >  f(x)} 


(TH3)   f  smap(sl)  s2   &  dora  f  =  si   &   //si  =  #32 
-> 
range  f  =  s2  <->  f-inv  :  smap 
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(TH4)      f    :    bmap      &     A   x   in   dora    f    |    f(x)    >   c 
-> 
A   z    in   dom    f    |     f(z)    >=   c    +    //{y    in   dom    f    |     f(y)    <    f(z)> 


(TH5)        partial-order(r)      &      t: tuple    & 

(A    1    <=   i    <   #t    I    r(t(i),    t(i+l)))      -> 
A    1    <=  j    <    #t,    j    <   k    <=   ifft    I 
r(t(j),    t(k)) 

Many  of  the  lemmas  are  proved  with  inductive  arguments.    We  use 
the  following  induction  schema: 

(TH6)  A  s  I  pr(s)  or  (E  sO 

I  ~pr(sO)  &  A  s  I  f(s)  <  f(sO)  ->  pr(s)) 

where  pr  is  a  predicate  and  f  a   function.    Typically,   we  will 
assume 

~pr(sO)  &  A  s  I  f(s)  <  f(sO)  ->  pr(s) 

and  derive  a  contradiction. 

We  introduce  the  following  additional  definitions. 

iscycle(p,  g)   <->   cycle(p)   &  ispath(p,  g) 

simplepaths (g,  xl,  x2)  = 
{p  in  paths(g,  xl,  x2)  |  siraple(p)} 


The  lemmas  below  have  the  following  assumptons  as  (implicit) 
hypotheses. 

flowgraph(g,  r) 

r  "in  range  g 

(A  e  in  g  I  e(l)  ~=  e(2)) 


5.2.1  Properties  of  Paths 


We  begin  by  proving  several  basic  lemmas  about  paths,   which 
will  enable  us  to  form  new  paths  in  subsequent  lemmas. 

LEMMA  1.1 

pi  :  tuple  &  p2  :  tuple 

-> 
edges(pl  I  I  p2)  =  edges(pl)  +  edges(p2)  +  {[pl(#pl),  p2(l)]} 

PROOF. 
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By  definition, 

edges (pi  1  I  p2)  = 
{[(pi  II  p2)(i),  (pi  II  p2)(i+l)]  :  1  <=  i  <  //(pi  II  p2)} 

By  identity, 

edges(pl  I  I  p2)  = 

{[pl(i),  pl(i+l)]  :  1  <=  i  <  //pi}  + 

{pl(//pl),  p2(l)]}  + 

{[p2(i),  p2(i+l)]  :  1  <=  i  <  //p2} 


=  edges(pl)  +  edges(p2)  +  {[pl(//pl),  p2(l)]}. 


QED, 


LEMMA  1.2 

ispath(p,  g)   &   [p(//p),  v]  in  g   &   pi  =  p  ||  [v] 

-> 
ispath(pl,  g) 

PROOF . 

We  must  show  that 

edges(pl)  subset  g 
By  Lemma  1.1, 

edges(pl)  =  edges(p)  +  edges([v])  +  {[p(//p),  v]  } 
Since  edges(p)  subset  g  by  hypothesis,  and  edges([v])  =  {}, 

ispath(pl,  g) 

follows    immediately. 

QED. 

LEMMA    1.3 

ispath(pl,  g)   &   ispath(p2,  g)   &   pl(//pl)  =  p2(l) 
&   p  =  pl(l://pl-l)  I  I  p2 

-> 
ispath(p,  g) 

PROOF. 

Again,  we  must  show 

edges(p)  subset  g 
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This  follows  immediately  from  Lemma  1.1. 
QED. 

The  following  lerama  states  properties  of  subpaths. 

LEMMA  1.4 

1  <=  i  <  //p  &  i  <  j  <=  #p  &  pi  =  p(i:j) 
-> 

(i)      range   pi    subset    range   p 

(ii)    ispathCp,    g)      ->      ispath(pl,    g) 

(iii)    #pl    <=      //p 

These    formulae  are   decidable. 

QED. 

The  next  lemma  states  that  given  any  path  from  xl  to  x2 ,  we  can 
always  find  a  simple  path  from  xl  to  x2.  We  make  the  following 
inductive  definition. 

defpath(q)  =  i f  #q  =  1  then  q 
elseif 

E  1  <=  j  <  //q  I  q(j)  =  q(//q)  then 
defpath(q(l:j)) 
else 

defpath(q(l://q-l))  II  [q(//q)] 
end 


LEMMA   1 . 5 

ispath(p,  g)   &   pi  =  defpath(p) 

-> 
pi  in  paths(g,  p(l),  p(//p))  &  simple(pl) 
&  range  pi  subset  range  p 

PROOF. 

We  use  induction  on  the  length  of  p,  applying  the  induction 
schema  (TH6) . 

Assume  the  lemma  is  false  for  p,  but  true  for  all   paths   p' 
such  that  //p'  <  //p  and  derive  a  contradiction. 

Case  I.   #p  =  1 

Then 

defpath(p)  =  p 
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Since  #p  =  1,  p  is  simple.   Therefore, 

simple(pl)  &  pi  in  paths(g,  p(l),  p(//p)) 
&  range  pi  =  range  p 

Case  II.   E  1  <=  j  <  //p  |  p(j)  =  p(#p). 

Then 

defpath(p)    =  defpath(p(l : j ) ) 

Since   //p  ( 1 :  j  )    <   #p,   we  know   that 

simple(pl)    &   pi    in  paths(g,    p(l),    p(#p))    &   range      pi      subset 
range   p 

by   the   inductive   hypothesis,    and   since  p(j)    =   p(//p). 

Case  III.      p(#p)    in   range   p(l://p-l) 

Then 

defpath(p)    =  defpath(p(  1 : //p-1) )    11     [p(#p)] 
Again, 

#p(l://p-l)    <    //p   &    ispath(p(l:y/p-l),    g) 

so  we  can  apply  the  inductive  assumption  and  conclude 

defpath(p(l://p-l))  in  paths(g,  p(l),  p(//p-l))  & 
simple(defpath(p(l:#p-l) )  &  range  defpath(  p(l:#p-l)) 
subset   range  p(l://p-l) 

Therefore 

pi  in  pathsCg,  p(l),  p(//p))   & 
siraple(pl)  &  range  pi  subset  range  p. 


QED. 


LEMMA  1.6 

p  in  paths(g,  xl,  xl)   &   !!'p>l 

-> 
E  pi  in  paths(g,  xl,  xl)  |  cycle(pl)  &  range  pi  subset  range  p 

PROOF. 

Consider 

p'  =  p(l://p-l). 
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Then 

p'(l)  -=   p'(#p'), 

since  there  are  no  edges  of  the  form  [w,  w] ;   that  is, 

p'(#p')  ~=  p(//p)  =  p(l). 

By  Lemma  1.5, 

E  pi'  in  paths(g,  xl,  p'(//p'))  |  simple(pl')  & 
range  pi'  subset  range  p' 

Let 

pi  =  pi'  II  [p(//p)]. 
Then 

cycle(pl)  &  pi  in  paths(g,xl ,xl)  &  range  pi  subset  range  p. 
QED. 

The  following  lemma  states  that  if  p  is  a  path  from  a  node  xl  not 
in  a  set  s  to  a  node  x2  in  s ,  then  there  is  an  edge  in  the  path  p 
such  that  the  first  node  of  the  edge  is  not  in  s,  and  the  second 
node  of  the  edge  is  in  s. 

LetMA  1.7 

p  in  paths(g,  xl,  x2)   &   xl  ~in  s   &  x2  in  s 

-> 
E  1  <=  i  <  #p  I  p(i)  "in  s   &   p(i+l)  in  s 

PROOF. 

We  use  induction  on  the  length  of  p. 
Suppose 

"E  1  <  i  <  #p  I  p(i)  ~in  s  &  p(i+l)  in  s 

Consider  [p(l),  p(2)].  If  p(2)  in  s,  then  we  are  done.  Assume 
p(2)  "in  s.   Then  we  construct  a  path 

pi  =  p(2:   #p). 

We  can  now  apply  the  inductive  hypothesis,  since  #pl  <  n. 

QED. 
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LEMMA  1.8 

f  :  map(s)  s  &  nocycles(f)  & 
fl  =  f  +  [u,  v]   &  V  "in  dom  f 

-> 
nocycles(f 1) 

PROOF. 

Suppose  p  is  a  cycle  of  f 1 .   Then 

[u,  v]  in  edges (p) . 
Otherwise, 

edges(p)  subset  f, 
and  there  are  no  cycles  in  f.   Suppose 

[p(i),  p(i+l)]  =  [u,  V] 
Then,  if  i  <  //p-1, 

[p(i+l),    p(i+2)]    in  edges(p), 
and   since   p(i+l)    =  v,     [p(i+l),    p(i+2)]    in   f.      But 

V  ~in   dora    f. 
Contradiction.      Therefore,    i   =    #p-l. 
and 

p(#p)    =   p(l)    =  V. 
It    follows    that 

V  in  dora    f. 
There  fore, 

nocycles( f 1) • 
QED. 
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5.2.2  Properties  of  Collapse 

LEMMA  2.1 

If  the  root  r  of  a  flowgraph  is  not  the  target  of  an  edge,  then  r 

is  not 
in  any  strong  interval. 

flowgraph(g,  r)   &   (A  e  in  g  |  e(2)  ~=  r)   & 
strong-interval(i,  g,  x)    -> 
r  "in  i. 

PROOF. 

Suppose 

r  in  i. 
Since  there  is  a  cycle  in  i,  //i  >  1. 

E  n  in  i  |  n  "=  r 
Since  i  is  strongly  connected,  there  is  a  path  from  n  to  r.   So, 

E  e  in  g  |  e(2)  =  r. 

Contradiction. 

QED 

The  lemma  belov/  gives  the  number  of  nodes  in  the  collapsed   graph 

gl,   assuming  that  g  is  collapsed  by  removing  a  set  i,  where  i  is 

single  entry,  and  the  entry  node  is  not  the  root.   This  lemma   is 

used  subsequently  to  determine  the  number  of  nodes  in  a  collapsed 

tree,  as  well  as  a  collapsed  graph. 

LEMMA    2.2 

A    [u,    v]    in   g    I    u   ~in   i&vini->v=x&x~=r 
&      gl    =  collapse(g,    i,    x) 

-> 
nodes(gl)    =  nodes(g)    -    (i   -    {x}) 

PROOF. 

By  definition, 

nodes(g)  -  (i  -  {x}) 

dom  g  +  range  g  -  (i  -  {x}) 
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Note  that  x  in  dom  g,  so  the  above  is  equivalent  to 

dora  g  +  range  g  -  i  +  {x}. 

By  identity, 

dora  g  +  range  g  -  i  +  {x}  = 

{e(l)  :  e  in  g}  +  {e(2)  :  e  in  g}  -  (i  -  {x})  = 

{e(l)  :  e  in  g  I  e(l)  ~in  i  }  + 

{e(2)  :  e  in  g  I  e(2)  ~in  i  }  +  {x}. 

By  definition  of  collapse, 

gl  =  {[u,  v]  in  g  I  u  "in  i}  + 

{ [x,  v]  :  [u,  v]  in  g  I  u  in  i  &  V  "in  i} 

By  identity, 

nodes(gl)  = 

dom  gl  +  range  gl  = 

(1)   {e(l)  :  e  in  g  I  e(l)  "in  i}  + 

if  E  [u,  v]  in  g  I  u  in  i  &  V  "in  i  then  {x}  else  {}  + 

{e(2)  :  e  in  g  I  e(l)  "in  i}  + 

{e(2)  :  e  in  g  I  e(l)  in  i  &  e(2)  "in  i}. 

Note  that 

E  u  I  [u,  x]  in  g  &  u  "in  i 

since  there  is  a  path  from  the  root  r  "in  i  to  i,  and  i  is  single 

entry. 

Therefore, 

X  in  {e(2)  :  e  in  g  |  e(l)  "in  i} 

so  that  the  formula  (1)  simplifies  to 

{e(l)  :  e  in  g  |  e(l)  "in  i  }  + 
{x}   + 

{e(2)  :  e  in  g  I  e(l)  "in  i  }  + 

{e(2)  :  e  in  g  |  e(2)  "in  i} 

Since  e(l)  "in  i  ->  e(2)  =  x  or  e(2)  "in  i,  the  above  simplifies 
to 

{e(l)  :  e  in  g  I  e(l)  "in  i  }  + 
{x}   + 

{e(2)  :  e  in  g  I  e(2)  "in  i  }   = 
nodes(g)  -  (i  -  {x}). 

QED. 
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The  next  lemma  follows  immediately  from  Lemma  3.1. 

LEMMA  2.3 

#  nodes  (gl)  <  //nodes  (g) 

Lemma  3.2  is  used  in  subsequent  induction  proofs.   The  next  lemma 

states   that  if  pi  is  a  path  from  xl  to  x2  in  the  collapsed  graph 

gl  then  there  is  a  path  p  from  xl  to  x2  in  g  such  that  the   range 

of  pi   is   equal   to  the  range  p  -  (i  -  {x}).   We  first  make  the 

following  definitions.   We  define  pathinv(v)  to  be  a  path  p'  from 

the   interval   head  x  to  v,  such  that  the  range  of  p'(l  :   //p'-l) 

is  a  subset  of  the  interval  i. 


pathinv(v)  =  arb  {p'  in  paths(g,  x,  v)  | 

range  p'(l:#p'-l)  subset  i} 

VJe  form  a  path  p  using  the  following  recursive  definition: 

defpath(q)  =  if  #q  =  1  then  q 
elseif 

[q(//q-l),  q(#q)]  ~in  g  then 

defpath(q(l://q-2))  ||  pathinv(q  (//q) ) 
else 

defpath(q(l:#q-l))  ||  [q(#q)] 
end 


We   now   prove    that   defpath(pl)    is   a   path    in  g. 

LEMI^        2 . 4 

ispath(pl,  gl)  &  p  =  defpath(pl) 

-> 
p  in  paths(g,  pl(l),  pl(#pl))  &  range  pi  =  range-p(i-{x}) 

PROOF. 

For  some  edge  [u,  v]  in  pi,  if  [u,  v]  ""in  g,  then  by   definition 
of  collapse, 

u  =  X  &  E  w  in  i  |  [w,  v]  in  g. 

Moreover,  since  i  is  strongly  connected. 
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E  pxv  in  paths (g,  x,  v)  |  range  pxv(  1 : #pxv-l)  subset  i. 
Then  pathinv(v)  is  defined  if  [x,  v]  is  an  edge  of  gl  and  not   of 

g.    We  use   induction   schema   (TH6)   to  prove  that  defpath(pl) 

satisfies  the  lemma.   We  assume  that  the  lemma  is  false   for   pi, 

but   true   for  all   paths   which   are  shorter,  and  then  derive  a 

contradiction. 

Case   I.      //pi    =   1 
Then 

defpath(pl)  =  pi. 
In  this  case  it  is  clear  that  the  lemma  is  true  for  pi. 
Case  II.   [pl(//pl-l),  pl(//pl)]  ~in  g 
Then 

defpath(pl)    =  defpath(pl(I:#pl-2))     ||    pathinv(pl  (//pi) ) 

Let 

p2  =  defpath(pl(l://pl-2)) 
We  know  that 

p2  in  pathsCg,  pl(l),  pl(//pl-2))  & 
range  pl(l://pl-2)  =  range  p2  -  (i  -  {x}) 

Also, 

pathinv(pl  (//pi) )    in  paths(g,    x,    //pi) 
and 

range   pathinv(pl  (//pi) )    subset    i  +    {pl(//pl)}. 
Therefore, 

p2    II    pathinv(pl(//pl))    in  paths(g,    pl(l),    pl(//pl)) 
and 

range    pi    =  range   defpath(pl)    -    (i   -    {x}). 
Case    III.       [pl(//pl-l),    pl(//pl)]    in    g 
Then 

defpath(pl)    =  defpath(pl  ( 1 : //pl-1) )     ||     [pl(//pl)]. 
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Again,  by  assumption,  if 

p2  =  defpath(pl(l:#pl-l)), 

then 

range  pl(l:#pl-l)  =  range  p2  -  (i-{x}) 
so 

range  defpath(pl)  =  range  pi  -  (i-{x}) 
since  pl(//pl)  "in  i-{x}.   Also, 

p2  in  paths(g,  pl(l),  pl(//pl-l)) 
so 

defpath(pl)    inpaths(g,    pl(l),    pl(//p)). 

QED. 

Next,  we  show  that  if  p  is  a  path  in  g,  and  p(l)  is  in  i  -  {x}, 
then  we  can  collapse  p  to  obtain  a  path  pi  in  gl,  such  that  pl(l) 
=  p(l).      We    first   make    the    following   inductive    definition: 

defpath(q)    =  if  #q   =    1    then  q 

elseif  q(#q)    in  i   &   q(//q-l)    in   i   then 

defpath(q(l:#q-l)) 
else 

defpath(q(l:#q-l))     ||     [q(//q)] 

end 


LEMMA   2.5 

ispath(p,    g)    &   p(l)    "in   i   -    {x} 
&   pi   =   defpath(p) 
-> 

if  p(#p)    "in   i   -    {x}   then 

pi  in  paths(gl,  p(l),  p(//p))  &  range  pi  subset  range  p 

else 

pi  in  paths(gl,  p(l),  x)  &  range  pi  subset  range  p 

end 

PROOF. 

We  assume  the  lemma  is  false  for  p,  but  true  for  all  paths  which 
are  shorter  than  p. 

Case  I.   #p  =  1 

Then  the  lemma  follows  immediately,  since  pi  =  p. 
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Case  II.   p(//p)  in  i  &  p(//p-l)  in  i 
Then 

defpath(p)    =  defpath(p(l : #p-l) ) 

By  assumption, 

defpath(p(l:#p-l))    inpaths(gl,    p(l),    x) 

&   range   defpath(p(l : //p-I) )    subset    range   p(l:#p-l) 

since 

p(#p-l)  in  i. 

Therefore, 

pi  in  paths (gl,  p(l),  x)  &  range  pi  subset  range  p. 

Case  III.   p(#p)  ~in  i  or  p(#p-l)  ~in  i. 

Then 

defpath(p)    =  defpath  (p(  1 : //p-1) )     ||     (p(f/p)] 

Suppose 

p(//p-l)    in   i. 

Then 

defpath(p(l: //p-1) )    inpaths(gl,    p(l),    x) 

and 

[x,    p(#p)]    in   gl 

by    definition   of  collapse. 

Therefore, 

defpath(p)    in   paths(gl,    p(l),    p(//p)) 

Suppose 

p(//p-l)    ~in    i. 

Then 

p(//p)    in    i, 

and    therefore 

p(#p)    =   X, 
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since  i  is  single  entry. 
By  assumption 

defpath(p(l:#p-l))  in  paths(gl,  p(l),  p(//p-l)) 
Therefore, 

defpath(p)  in  paths(gl,  p(l),  x) 

QED. 

To  prove  the  recursive  version  of  our   algorithm,   we  must   show 
that  collapsing  the  graph  does  not  spoil  the  input  assumptions. 

LEMMA  2.6 

Suppose, 

flowgraph(g,  r)   &   gl  =  collapse(g,  i,  x)   & 
r  "in  range  g   &  strong-interval(i,  g,  x)  & 
(A  e  in  g  I  e(l)  "=  e(2)) 

We  prove  each  of  the  following  claims. 

(i)   flowgraph(gl,  r) 

PROOF . 

We  must  show: 

(A  n  in  nodes(gl)  |  haspath(gl,  r,  n)) 
We  know  from  Lemma  2.2  that 

nodes(gl)  =  nodes(g)  -  (i  -  {x}) 
We  also  know  from  Lemma  2.1  that 

r  ~in  i 
By  properties  of  haspath, 

haspath(gl,  r,  r). 
For  n  "=  r, 

n  in  nodes(gl)  ->  n  ~in  i-{x}. 
Therefore,  we  can  apply  Lemma  2.5  to  prove  (i). 
QED. 
(ii)  r  "in  range  gl 
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PROOF. 

By  definition  of  collapse, 

gl  =  {[u,  v]  in  g  I  u  ~in  i}  + 

{[x,  v]  :  [u,  v]  in  g  I  u  in  i  &  V  ~in  i} 

Therefore, 

range  gl  subset  range  g 
QED. 

(iii)  (A  e  in  gl  I  e(l)  ~=  e(2)) 

PROOF. 

By  hypothesis, 

e  in  { [u,  v]  in  g  I  u  "in  i}   -> 
e(2)  ~=  e(l). 

If 

e  in  { [x,  v]  :   [u,  v]  in  g  |  u  in  i  &  v  ~in  i} 
then  we  note  by  definition  of  interval,  x  in  i  which  implies 

X  ~=  V. 
We  conclude 

e  in  gl  ->  e(l)  ""=  e(2)  . 
QED. 
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5.2.3  Depth  First  Spanning  Tree  Properties 

The  lemmas  in  this  section  state  properties  of  depth  first 
spanning  trees  of  flow  graphs.  We  will  assume  in  these  lemmas 
the  hypothesis: 

[t,  n]  in  depth- first-trees(g,  r) 

We  first  prove  properties  of  paths  in  trees. 

LEMMA  3.1 

p  in  paths(t-inv,  x,  u)   &   pi  in  paths(t-inv,  x,  v) 

-> 
if  #p  <  #pl  then 

P  =  pl(l:#p) 
else 

pi  =  p(l:#pl) 
end 

PROOF. 

Suppose  //p  <  #pl  and  p  "=  pl(l:#p).   Then  we  let  k  be   the   index 
of  the  first  elements  of  p  and  pi  which  are  not  equal. 

E  1  <  k  <=  #p  I  p(i)  -=  pl(i)   & 

A  1  <  j  <=  #p  I  p(j)  ~=  pKj)  ->  j  >=  k 

Since  k  >  1,  it  follows  that 

[p(k-l),  p(k)]  in  t-inv  &  [p(k-l),  pl(k)]  in  t-inv, 
but 

p(k)  ~=  pl(k). 
This  contradicts  the  fact  that 

t-inv  :   sraap. 

QED. 

As  a  result  of  this  lemma,  we  make  use  of  the  fact  that 

haspath (t-inv,  x,  y)  &  x  ~=  y 

-> 
haspath (t-inv,  t-i^v(x),  y) 

We  next  show  that  a  tree  has  no  cycles. 
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LEMMA   3.2 
nocycles(t ) 
PROOF . 
Suppose 

E  p  in  paths(t-inv,  xl,  xl)  |  iscycle(p,  t-inv) 
Then 

r  ~in  range  p 

r  ~in  dom  t-inv. 
We  know  by  definition  that 

E  pi  in  siraplepaths (t-inv,  xl,  r) 
Therefore,  we  apply  lemma  3.1  to  obtain  the  following  two  cases; 
Case  I.   pi  =  p(I:#pl) 

This  is  impossible  since  r  "in  range  p. 
Case  II.   p  =  pl(l:#p). 
This  implies  that  pi  is  not  simple. 
QED. 
LEMMA  3.3 

haspath(t,  xl,  x2)   ->   ~haspath(t,  x2,  xl) 
PROOF. 
Suppose 

haspath (t-inv,  x2,  xl)  &  haspath(t-inv,  xl,  x2) 
Then  there  is  a  cycle  in  t-inv. 
QED. 

LEMNLA    3 . 4 

n(y)  >  n(x)   &   n(y)  <  n(x)  +  numdescs(t,  x) 
-> 
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haspath(t,  x,  y) 

PROOF. 

We  prove  this  lemma  by  showing  that  if  y  is  not  a  descendant  of 
X,  then  the  n  is  not  one-one. 

Suppose 

■"haspath(t,  x,  y) 
We  define  nl  to  be  n  restricted  to  the  descendants  of  x.   Let 

nl  =  n  I  {z  I  haspath(t,  x,  z)}. 
Then  since  n  is  one-one,  nl  is  also.   We  claim  that 

range  nl  =  {n(x)  <=  k  <  n(x)  +  numdescs(t,  x) } 
We  know  from  the  definition  of  tree  node  map  that 

range  nl  subset  {n(x)  <=  k  <  n(x)  +  numdescs(t,  x)}. 
Furthermore, 

#dom  nl  =  numdescs(t,  x) . 

We  can  therefore  apply  metatheorem  (TH3)  to  prove  the  claim. 
Finally,  y  "in  dom  nl  &  y  in  range  nl  implies  that  n  is  not 
one-one.   Therefore, 

haspath (t ,  x,  y) . 

LE^tMA   3 . 5 

n(x)  >  n(v)   &   ~haspath(t,  v,  x) 

-> 
A  xl,  vl  I  haspath(t,  x,  xl)   &  haspath(t,  v,  vl) 

->  n(xl)  >  n(vl) 

PROOF. 

Suppose  for  some  xl,  vl 

n(xl)  <  n(vl) 
We  know  by  definition  that 

n(xl)  >=  n(x) 


so 


n(xl)  >  n(v), 
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But  then 

n(xl)  <  n(v)  +  nuradescs(t,  v) . 

so  by  the  previous  lemma 

haspath(t ,  v,  xl) . 

Contradiction. 

QED. 

We   next    show   that    the    following   three   statements   are   equivalent: 

(i)      E    [t,    n]    in  depth- first-trees (g,    r),    e  in  g    | 
backedge(e,    t) 

(ii)    "eye le free (g) 

(iii)  A  [t,  n]  in  depth- first-trees (g,  r)  |  E  e  in  g  | 
back.edge(e,  t) 

Statement  (i)  follows  immediately   from   (iii).    We   proceed   to 
prove  that  (i)  implies  (ii),  and  then  that  (ii)  implies  (iii). 

LEMILA   3.6 

E  [t,  n]  in  depth- first-trees(g,  r),  e  in  g  |  backedge(e,  t) 

-> 
E  p  I  iscycle(p,  g) 

PROOF. 

Let 

[u,  v]  =  e 
and 

pt  =  arb  paths(t,  v,  u) 
We  know  such  a  path  exists  by  definition  of  backedge.   Let 

P'  =  pt  M  [v]. 

It    is   clear    that 

p'    in   paths(g,    v,    v)    &    //p'    >    1 
There  fore, 

E  p  I  cycle(p)  &  p  in  paths(g,  v,  v) . 
QEn. 
We  next  show  that  if  g  contains  a  cycle,  then  every   depth   first 
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spanning  tree  has  a  backedge. 

LEMMA   3 . 7 

iscycle(p,  g)   &   [t,  n]  in  depth-first-trees (g,  r) 

-> 
E  e  in  edges(p)  |  backedge(e,  t) 

PROOF. 

We  first  make  the  following  recursive  definition  of  a  right  to 
left  tree  walk. 

nright(z)  =  if  z  "in  dom  t-inv  then  1  else 
nright(t-inv(z))  +  1  + 

//{u  I  haspath(t,  t-inv(z),  u)  &  n(u)  >  n(z)} 
end 

We  show  in  the  next  two  claims  that  if  e  is  an  edge  which  is  a 
forward  edge,  a  tree  edge  or  a  cross  edge,  then  nright(e(l))  < 
nright(e(2)). 

CLAIM  1. 

A  y,  X  I    haspath(t,  y,  x)  ->  nright(y)  <=  nright(x) 

Let 

p  =  arb  paths(t-inv,  x,  y) 
We  prove  this  claim  by  induction  on 

#  p 
We  suppose 

nright(y)  >  nright(x) 
and  derive  a  contradiction. 
Case  I.   y  =  X 
Then 

nright(y)  =  nright(x) 
Case  II.   y  ~=  x. 
We  know  that  x  in  dom  t-inv  and 

nright(x)  =  nright (t-inv(x) )  +  1  +  F 
where  F  is  a  blobbed  formula.   Therefore, 

nright(x)  <  nright (t-inv(x) ) 
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But  we  can  now  apply  the  inductive  hypothesis  since  the  path 
from  t-inv(x)  to  y  is  shorter  than  the  path  from  x  to  y.  We 
conclude 

nright (t-inv(x) )  >=  nright(y) 
Therefore, 

nright (y)  <  nright (x). 

This  completes  the  proof  of  the  claim. 

We  next  claim  the  if  e  is   a   cross   edge,   then  nright(e(l))   < 
nrlght(e(2)). 

CLAIM  2. 

"haspathCt,  u,  v)  &  ~haspath(t,  v,  u) 
->  nrlght(u)  <  nright(v) 

We  prove  this  by  induction  on 

#{w  in  nodes(t)  |  haspath(t,  w,  v)  or  haspath(t,  w,  u)} 

Suppose 

nright(u)  >  nright(v). 

We  note  that  u  ~=  r  &  v  "=  r  since  there  is  no  path  from  u   to  v 
or  V  to  u.   Therefore, 

u  in  dora  t-inv  &  v  in  dom  t-inv. 
Let 

ul  =  t-inv(u)  &  vl  =  t-inv(v) 

Case  I.   ul  =  vl 

Then  since  n(u)  >  n(v),  by  theorem  (TH2), 

nright(u)    =  nrlght(ul)    +   1    + 

//{w    I    haspath(t,    u,    w)    &   n(w)    >  n(u)}    < 

nright(ul)    +    1    + 
//{w    I    haspath(t,    u,    w)    &   n(w)    >   n(v)}      = 

nright (v) . 

Therefore,  we  have  arrived  at  a  contradiction. 

Case  II.   ul  ~=  vl 

Then  either 
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""haspath(t,  ul,  v)  or  ~haspath(t,  vl,  u) 
since  suppose 

haspath(t-inv,  v,  ul)  &  haspath(t-inv,  u,  vl). 
Then 

haspath(t-inv,  vl,  ul)  &  haspath(t-inv,  ul,  vl). 
Let  us  assume 

"haspathCt,  ul,  v). 
Then  we  know  that 

n(ul)  >  n(v) . 

(Otherwise,  n(u)  <  n(v)  by  Lemma  3.5.)  We  can  therefore  apply  the 
inductive  hypothesis  and  conclude  that 

nright(ul)  <  nright(v). 

But  then  this  implies  that 

nright(u)  <  nright(v). 

This  completes  the  proof  of  the  claim.   We  now  know  that  for   all 
edges  e  of  g  which  are  not  back  edges, 

nright(e(l))  <  nright (e(2) ) 
Suppose  now  that 

A  1  <=  k  <  #p  I  ^backedge(p(k:k+l),  t) 
Then 

A  1  <=  k  <  #p  I  nright(k)  >  nright(k+l) 
We  then  apply  the  theorem  (TH5)  and  conclude  that 

nright(p(l))  >  nright  (p(//p) ) . 
But 

p(l)    =  p(//p) 
This   gives   a   contradiction. 
QED. 

The    following  Lemma   is   Lemma    2    from  section   5.1.2. 
LEMMA      3 . 8 
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(LEMMA  2.) 

The  formulae  (i),  (ii),  and  (iii)  above  are  equivalent. 

PROOF. 

Follows  from  Lemma  3.7  and  Lemma  3.6. 

QED. 
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5.2.4  Properties  of  Reachunder 

In  this  section  we  prove  properties  of  reachunder  which  will 
be  needed  to  prove  Tarjan's  irreducibility  criterion.  In 
particular,  we  will  show  that  if  x  is  the  back  edge  target  with 
highest  node  number,  and  if  r  is  not  in  the  reachunder  set  of  x, 
then  the  reachunder  set  together  with  x  is  a  strong  interval  with 
head  x. 

The  lemmas  in  this  section  have  the  following  assumptions  as 
implicit  hypotheses: 

[t,  n]  in  depth- first-trees (g,  r) 
targ-back-edge (x,  t,  g) 
i  =  reachunder (g,  t,  x)  +  {x} 
r  "in  i 

The  definition  if  reachunder  is: 

reachunder (g,  t,  x)  = 
{y  in  nodes (g)  1 

(E  z  in  nodes(g)  |  backedge([z,  x] ,  t)   & 

[z,  x]  in  g  & 
haspath({  [x2 ,  xl]  in  g-inv  |  xl  "=   x},  z,  y))} 

We  first  provide  a  simpler  criterion  for  y   in   reachunder (g,   t, 
x). 

LEMMA  4.1 

y  in  reachunder (g,  t,  x)  &  y  ~=  x   <-> 
E  z  in  nodes(g)  |  backedge([z,  x] ,  t)  &  [z,  x]  in  g   & 
E  p  in  pathsCg,  y,  z)  |  x  ~in  range  p 

PROOF. 

From  the  definition,  y  in  reachunder (g,  t,  x)  is  equivalent  to 

E  z  in  nodes (g)  |  backedge([z,  x] ,  t)  &  [z,  x]  in  g  & 
haspath({ [x2,  xl]  in  g-inv  |  xl  ~=  x},  z,  y) 

Then  by  identity, 

haspath({ [x2,  xl]  in  g-inv  |  xl  ~=  x},  z,  y)  <-> 
haspath({ [x2,  xl]  in  g-inv  |  xl  -=  x}-inv,  y,  z) 
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which  is  equivalent  to 

haspath({ [x2,  xl]  in  g  |  xl  "=  x},  y,  z) 
Suppose 

p  in  paths(<[x2,  xl]  in  g  |  xl  "=  x},  y,  z) 
Then 

p  in  pathsCg,  y,  z) . 
I f  X  in  range  p  then 

E  [x2,  x]  in  edges (p) 
which  is  a  contradiction.   Conversely,  if 

p  in  paths(g,  y,  z)  &  x  "in  range  p, 
then 

p    in  paths({[x2,    xl]    in  g    |    xl    ~=  x},    y,    z) 

QED. 

The  next  lemma  states  that  under  the  above  hypotheses,  all   nodes 
in  i  are  descendants  of  x  in  the  tree  t. 

LEbfMA  A.  2 

A  y  in  i  |  haspath(t,  x,  y) 

PROOF. 

Suppose 

E  y  in  reachunder (g,  t,  x)  |  "haspathCt,  x,  y) 
We  know  then  that 

E  p  in  paths(t,  r,  y)  |  x  "in  range  p 
since  there  is  a  tree  path  from  the  root  to  any  node,  and  if 

X  in  range  p, 

then  haspath(t,  x,  y)  would  be  true.   Since  y  in  reachunder (g,  t, 
x), 

E  z,  pi  in  paths(g,  y,  z)  |  x  ~in  range  p  & 
backedge([z,  x] ,  t)  &  [z,  x]  in  g 
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Let 

pr  =  p(l:)?p-l)  II  pi 


Clearly, 

pr  in  pathsCg,  r,  z)  &  r  ~in  range  p  & 
backedge ( [z ,  x] ,  t)  &  [z,  x]  in  g 

Therefore, 

r  in  reachunder (g,  t,  x) . 
Next,  we  note  that 

haspath(t,  x,  x) 

by  properties  of  haspath.   Therefore,  the  lemma  holds. 

QED. 

The  next  lemma  states  that  the  set  i  is  strongly  connected. 

LEMMA   4.3 

A  xl,  x2  in  i  |  E  p  in  paths(g,  xl,  x2)  j 
range  p  subset  i 

PROOF. 

Let  xl  and  x2  be  elements  of  i  such  that  xl  ~=  x  and   xl   ~=  x2. 
Since  xl  in  reachunder (g,  t,  x) , 

E  p  in  paths(g,  xl,  z)  |  backedge([z,  x] ,  t)  & 
[z,  x]  in  g  &  X  "in  range  p 

Let 

pi  =  arb  paths(t,  x,  xl). 
We  know  such  a  path  exists  by  the  previous  lemma. 
CLAIM.   range  pi  subset  i 
Let 

V  =  arb{range  pi  |  v  ''=x  &  v  "=   xl}. 
(If  there  is  no  such  v,  we  are  done.)  Then  let 

p2  =  pl(pl-inv(v)  :   //pi) 

X  "in  range  p2 
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since  pi  is  a  tree  path.   Let 

p3  =  p2(l:#p2-l)  II  p 

It   is   clear   that 

p3   in  paths(g,    v,    z)    &  x    ~in   range   p3   & 
backedge([z,    x] ,    t)    &    [z ,    x]    in  g 
Therefore, 

V  in  reachunder(g,  t,  x) . 

This  completes  the  proof  of  the  claim.   Now  we  show   there   is   a 
path  from  xl  to  x2  which  is  in  i. 

Case  I.   x2  in  range  pi. 

In  this  case,  we  are  done. 

Case  II.   x2  "in  range  pi. 

Then  let 

p4  =  arb  paths(t,  x,  x2) 

As  above,  range  p4  subset  i.   Therefore,  we  let 

p'  =  P  II  P'+t 
and  it  follows  that 

p'  in  paths(g,  xl,  x2)  &  range  p'  subset  i. 
QED. 

The  next  lemma  states  that  i  is  single  entry. 
LEMMA   4.4 

A    [y,    z]    in  g    I    y   ~in   i&zini->z=x 
PROOF. 
Suppose 

E    [y,    z]    in  g    I    y    "in    i&zini&z~=x 
Then   since    z    "=   x, 

z    in    reachunder(g,    t,    x) , 


so 


E   w,    p    in   paths(g,    z,    w)     |    x    ~in   range   p    & 
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[w,  x]  in  g  &  back.edge(  [w,  x]  ,  t) 
Since  y  ''in  i,  y  "=   x,  so  if  we  let 

P'  =  [y]  M  P 

p'  in  pathsCg,  y,  w)  &  x  "in  range  p'  & 

[w,  x]  in  g  &  backedge([w,  x] ,  t) 

Therefore, 

y  in  i. 
QED. 

We  next  show  that  i  has  at  least  one  cycle. 
LEMMA   4.5 

E  p  in  paths(g,  x,  x)  |  iscycle(p,  g)  &  range  p  subset  i 
PROOF. 
Since  x  is  a  back  edge  target, 

E  z  I  [z,  x]  in  g  &  backedge([z,  x] ,  t) 
Furthermore, 

z  in  reachunder(g,  t,  x) 
so 

z  in  i . 
By  Lemma  4.3, 

E  pi  in  paths(g,  x,  z)  |  range  pi  subset  i 
Therefore,  i f  we  let 

p  =  pl  II  [x] 

p  in  paths(g,  x,  x)  &  range  p  subset  i 
We  can  assume,  by  lemma  1.6,  that  p  is  a  cycle. 
QED. 
For  i  to  be  a  strong  interval,  all  cycles  in  i   must   contain  x. 
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However,  in  general,  this  is  not  the  case.  This  will  be  true  if 
X  is  the  back  edge  target  with  the  largest  node  number.  We 
therefore  add  this  additional  assumption. 

LEMMA  4.6 

n(x)    =  max    /   n [targ-back-edges(t ,    g)] 

-> 
A  p  I  iscycleCp,  g)  &  range  p  subset  i  -> 
X  in  range  p 

PROOF. 

Suppose 

(E  p  I  iscycle(p,  g)  &  range  p  subset  i  &  x  ~in  range  p) 

Since  p  is  a  cycle,  there  is  a  backedge  [u,  v]  in  range  p.  By 
hypothesis , 

u  in  i  &  V  in  i 

By  lemma  4.2, 

haspath(t,  x,  v) . 

Since  x  ~in  range  p,  v  ~=  x.   Therefore, 

n(x)  <  n(v). 

However,  this  violates  the  choice  of  x  as  the  back  edge  target 
with  the  highest  node  number. 

QED. 

Finally,  we  conclude  from  the  above   lemmas   that   if  x   is   the 

backedge   target  with  the  highest  node  number,  then  i  is  a  strong 

interval.   The  following  lemma  is  Lemma  4  from  section  5.1.4. 


LEMMA   4.7 
(LEMMA  4.) 

n(x)  =  max  /  n [ targ-back-edges (t ,  g) ] 

-> 
strong-interval(i,  g,  x) 

PROOF. 
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Follows  immediately  from  Lemma  4.3,  4.4,  4.5,  and  4.6. 
QED. 
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5.2.5  Properties  of  Collapse-tree 
Suppose  that  we  collapse  a  graph  g  with   strong   interval   i   to 
obtain  a  graph  gl.   1 f  t  is  a  depth  first  spanning  tree  of  g  with 
node  map  n,  and  i f  we  let 

[tl,  nl]  =  collapse-tree(t ,  n,  i,  x) 
then  we  show  that  tl  with  node  map  nl  is  a  depth   first   spanning 
tree   of  gl.    Secondly,   we  show  that  if  i  is  a  strong  interval 
with  head  x,  and  x  is  a   back   edge   target   with   greatest   node 
number,  then 

targ-back-edges (gl ,  tl)  =  targ-back-edges(g,  t)  -  {x}. 
Let  us  first  review  the  definition  of  collapse-tree  given  in   the 
previous   section  of  this  chapter.   If  n  is  a  tree  node  map,  then 
we  obtain  nl  as  follows: 


collapse-map (t ,  n,  i,  x)  = 

{[y,  n(y)]  :  y  in  nodes(t)  1  n(y)  <=  n(x)}  + 

{[y>  ri(y)  -  /'{z  in  i  |  n(z)  <  n(y)}+l]  :  y  in  nodes(t)  -  i 

I  haspath(t,  x,  y)  }  + 

{[y,  n(y)  -  #i  +  1]  :  y  in  nodes(t)  |  n(y)  >  n(x)  & 
~haspath(t,  x,  y)  } 

Then  collapse-tree  is  defined  by: 

collapse-tree(t ,  n,  i,  x)  = 

[collapse(t,  i,  x) ,  collapse-map(t ,  n,  i,  x)] 


LEMMA    5 . 1 
(LEMMA  5.) 

i  =  reachunder(g,  t,  x)  +  {x}   &   r  ~in  i   & 

gl  =  collapse(g,  i,  x)  &  [t,  n]  in  depth-f irst-trees (g,  r) 
&   n(x)  =  max  /  n  [targ-back-edges(t ,  g) ] 
&  [tl,  nl]  =  collapse-tree(t ,  n,  i,  x) 

-> 
[tl,  nl]  in  depth-f trst-trees(gl ,  r) 


Case  Study:  Tarjan's  Graph  Reducibility  Test  Algorithm  PAGE  5-85 


To  prove  that  tl  and  nl  constitute  a  depth  first   spanning   tree, 
we  must  prove  the  following  facts: 

(i)   nodes(tl)  =  nodes(gl) 

(ii)  tl  subset  gl 

(iii)  tl-inv  :  smap 

(iv)  (A  y  in  nodes(tl)  |  t l-inv-lira(y)  =  r) 

(v)  nl  :  smap   &  dora  nl  =  nodes(tl)  & 

range  tl  =  {1  <=  k  <=  #nodes(tl)} 

(vi)   A  u,  v  I  haspath(tl,  u,  v)  ->  nl(u)  <=  nl(v)   & 
nl(v)  <  nl(u)  +  numdescs(tl,  u) 

(vii)  A  [u,  v]  in  gl  -  tl  |  haspath(tl,  u,  v)  or 
haspath(tl,  v,  u)  or   nl(u)  >  nl(v) 

We  prove  each  conclusion  below. 

PROOF  OF  (i). 

We  know  by  hypothesis  that 

nodes (g)  =  nodes (t). 
By  lemma  2.2,  we  know  that 

nodes(gl)  =  nodes(g)  -  i  +  {x}. 
and  also  that 

nodes(tl)  =  nodes(t)  -  i  +  {x}. 
Therefore, 

nodes(tl)  =  nodes(gl). 

PROOF  OF  (ii). 

By  definition  of  collapse, 

tl  =  { [u,  v]  in  t  I  u  "in  i}  + 

{[x,  v]  :  [w,  v]  in  t  I  u  in  i  &  V  ~in  i} 


and 


gl  =  { [u,  v]  in  g  I  u  ~in  i}  + 

{[x,  v]  :  [w,  v]  in  g  I  u  in  i  &  v  ~in  i} 
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Since  t  subset  g,  it  follows  that  tl  subset  gl. 
PROOF  OF  (iii). 

We  know  that  t-inv  is  a  single  valued  map,  and 

tl-inv  =  { [v,  u]  in  t-inv  |  u  "in  i}  + 

{ [v,  x]  :  [v,  u]  in  t-inv  |  u  in  i  &  v  ""in  1} 

Let 

si  =  { [v,  u]  in  t-inv  |  u  "in  i}, 
and 

s2  =  { [v,  x]  :   [v,  u]  in  t-inv  |  u  in  i  &  v  ~in  i} 
Then 

range  s2  =  {x>, 
so 

s  2  :   sma  p 
follows.   Since  si  subset  t-inv,  it  follows  that 

si  :   sma p. 
Then,  since  dom  si  *  dom  s2  =  {}, 

si  +  s2  :   smap 
QED. 

PROOF  OF  (iv). 
By  lemma  2.6  (ii) , 

r  ~in  range  gl. 
Therefore, 

r  "in  dora  gl-inv. 
Since 

tl-inv  subset  gl-inv, 

r  "in  dom  tl-inv. 
We  next  show  that 

(A  y  in  nodes(tl)  |  haspath (t 1-inv,  y,  r)) 
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We  know  that 

haspath(t-inv,  y,  r) 

by  hypothesis.   This  is  equivalent  to 

haspath  (t ,  r,  y)  . 
And  since  y  is  in  the  collapsed  graph, 

y  ~in  i  -  {x}  &  r  ""in  i 
Therefore,  by  lemma  2.5, 

haspath(tl,  r,  y) 
which  is  equivalent  to 

haspath (t 1-inv,  y,  r). 
Therefore, 

r  =  tl-inv-lim(y) . 

QED. 

PROOF  OF  (v). 

Let 

nil  =  {[y,  n(y)]  :  y  in  nodes(t)  |  n(y)  <=  n(x)}, 

nl2  =  {[y,  n(y)  -  //{z  in  i  |  n(2)  <  n(y)}  +  1]  : 
y  in  nodes(t)  -  i  |  haspath(t,  x,  y)} 

and 

nl3   =   {[y,    n(y)-#i+l]     :    y   in   nodes(t)     |    n(y)    >  n(x)    & 
'■haspath(t ,    x,    y)  } 

It  is  clear  that 

nl  =  nil  +  nl2  +  nl3. 
We  calculate  the  domains  and  ranges  of  nil,  nl2,  and  nl3. 
By  identity, 

dom  nil  =  {y  in  nodes(t)  |  n(y)  <=  n(x)} 
Since 

nodes(tl)  =  nodes(t)  -  i  +  {x}, 
and 
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A  y  in  i  -  {x}  |  n(y)  >  n(x), 

dom  nil  =  {y  in  nodes(tl)  |  n(y)  <=  n(x)}. 

Next, 

dom  nl2  =  {y  in  nodes(t)  -  i  |  haspath(t,  x,  y)} 

Therefore, 

dom  nl2  =  {y  in  nodes(tl)  -  {x}  |  haspath(t,  x,  y)} 

Finally, 

dom  nl3  =  {y  in  nodes(t)  |  n(y)  >  n(x)  &  ~haspath(t,  x,  y)} 

We  know  that 

A  y  in  i  |  haspath(t,  x,  y). 

Therefore, 

dom  nl3  =  {y  in  nodes(tl)  | 

n(y)  >  n(x)  &  ~haspath(t,  x,  y)} 

We  conclude,  since  x  in  dora  nil,  that 

dom  nil  +  dom  nl2  +  dora  nl3  =  nodes(tl) 

Next,  we  determine  the  ranges  of  these  maps. 

range  nil  =  {n(y)  :  y  in  nodes(t)  |  n(y)  <=  n(x)} 

By  hypothesis, 

range  nil  =  {1  <=  k  <=  n(x)}. 

Next, 

range  nl2  =  {n(y)  -  //{z  in  i  |  n(z)  <  n(y)}  +  1  : 

y  in  nodes(t)  -  i  |  haspath(t,  x,  y)}. 

We    claim    that    this    set    is    equivalent    to 

(1)       {n(x)    <   k    <=   n(x)    +    //{y    in    nodes(t)    -   i    | 
haspath (t ,    x,    y)  }} 

Let 

s    =    {y    in    nodes(t)    -    i    |    haspath(t,    x,    y)} 

and    for   y    in   s,    we   define    a    function    f   such    that 
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f(y)  =  n(y)  -  Hz    in  i  |  n(z)  <  n(y)}  +  1 
Therefore,  proving  that  range  nl2  is  equal  to  (1)  is  the  same   as 
proving 

(2)  f[s]  =  (1). 

We  first  shov;  that  for  y  in  s ,  f(y)  is  in  the  set  (1).   From   the 
definition  of  depth  first  spanning  tree,  we  know  that 

A  y  I  haspath(t,  x,  y)  ->  n(x)  <=  n(y)  <=  n(x)  + 
//{z  I  haspath(t,  x,  z)} 

Let  y  be  in  s.  Ve   apply  theorem  (TH4),  since  n  is   one-one,   and 
conclude 

n(y)  >=  n(x)  +  #{z  in  i  |  n(z)  <  n(y)} 
Therefore, 

n(y)  -  //{z  in  i  |  n(z)  <  n(y)}  +  1  >  n(x). 
It  follows  that 

f(y)  >  n(x). 

We  next  establish   an  upper  bound   for   f(y)«   Again  applying 
theorem  (TH4), 

n(y)  <=  n(x)  +  #{z  |  haspath(t,  x,  z)}  - 

//{z    in      i    I    haspath(t,    x,    z)    &  n(z)    >  n(y)} 

Therefore, 

n(y)    -   #{z    in   i    |    haspath(t,    x,    z)}   +   1    <= 
n(x)   +   //{z    I    haspath(t,    x,    z)}   - 
//{z    in      i    I    haspath(t,    x,    z)    &  n(z)    >  n(y)} 
#{z    in   i    I    n(z)    <   n(y)}   +    1 

n(x)    -    //{z    in   i    |    haspath(t,    x,    z)}   + 

//{z    in   nodes(t)    -   i    |    haspath(t,    x,    z)}   - 
Hz    in    i    I    n(z)    >  n(y)}   - 
#{z    in   i    I    n(z)    <   n(y)}  +    1 

nCx)    -    //{z    in  nodes(t)    -   i    |    haspath(t,    x,    y)}. 

There  fore, 

f(y)  <=  //{z  in  nodes(t)  -  i  |  haspath(t,  x,  z)} 
We  have  shown  that 

range  f  subset  (1) 
It  is  clear  that 
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//dom  f  =  //  (1) 
Therefore,  to  prove  equality,   we  must   show  that   f  is   onto. 
Suppose 

yl  in  s,  y2  in  s,  n(yl)  <  n(y2). 
Let 

h(y)  =  //{z  in  i  |  a(z)  <  n(y)}. 
If 

h(yl)  =  h(y2), 
then  it  follows  that 

f(yl)  <  f(y2). 
Suppose 

h(yl)  <  h(y2). 
Note  that  h(yl)  >  h(y2)  is  impossible. 
Then 

n(y2)  >  n(yl)  +  h(y2)  -  h(yl) 
also  by  Theorem  (TH4). 
Therefore, 

n(y2)  -  p(y2)  >  n(yl)  -  p(yl). 
This  completes  the  proof  of  the  claim. 
Next  we  consider  the  range  of  nl3. 


range  al3  =  {n(y)  -  //i  +  1  :  y  in  nodes(t)  | 
n(y)  >  n(x)  &  ~haspath(t,  x,  y)} 

By  hypothesis, 

{n(y)    :    y    in    nodes(t)     |    n(y)    >   n(x)    &    "'haspath(t,    x,    y)} 
{n(x)    +   //{z    I    haspath(t,    x,    z)}    <   k    <=   //nodes(t)} 

Therefore, 

range  nl3  = 

{n(x)    +   Hz    I    haspath(t,    x,    z)}   -   //i   +    1    <  k   <= 
//nodes  (t)    -   #i   +    1} 

This    in    turn    is    equivalent    to 
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{n(x)  +  //{z  in  nodes(tl)  I  haspath(t,  x,  z)}   <  k  <= 
//nodes  (tl)} 

We  conclude  that 

range  nl  =  {1  <=  k  <=  //nodes  (tl)}. 

Furthermore,  since  the  map  is  single  valued,  and  the  size  of   the 
domain  and  range  are  the  same, 

nl  :   bmap. 

QED. 

We  show  that  collapse-map  preserves   the   order  of  nodes.   The 

proofs  of  (vi)  and  (vii)  then  follow  immediately. 

CLAIM. 

A  xl,  x2  in  nodes(tl)  |  n(xl)  <  n(x2)  ->  nl(xl)  <  nl(x2) 

First  it  is  clear  that 

xl  in  dora  nil  &  x2  ~in  dora  nil  ->  nl(xl)  <  nl(x2) 

and 

xl  in  dom  nl2  &  x2  in  dom  nl3  ->  nl(xl)  <  nl(x2) 

We  consider  the  other  cases.   Suppose 

xl  in  dom  nil  &  x2  in  dom  nil. 
Then 

nl(xl)  =  n(xl)  &  nl(x2)  =  n(x2) 
so 

nl(xl)  <  nl(x2) 
Next,  suppose 

xl  in  dora  nl2  &  x2  in  dora  nl2. 
As  shown  above,  this  implies  that 

nl(xl)  <  nl(x2). 
Finally,  if 

xl  in  dom  nl3  &  nx  in  dom  nl3. 
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then 

nl(xl)  =  n(xl)  -  #i  +  1, 

and 

nl(x2)  =  n(x2)  -  #i  +  1. 
Since  we  are  subtracting  a  constant, 
nl(xl)  <  nl(x2). 

In  the  next  lemma  we  show  that: 

(i)  If  [v,  xl]  is  a  backedge  of  gl  with  respect  to  tl,  then 
xl  is  a  back  edge  target  of  g  with  respect  to  t:  either  [v,  xl] 
is  a  back  edge  of  g  or  there  is  a  u  in  i  such  that  [u,  xl]  is  a 
back  edge  and  v  is  x. 

It  follows  immediately  that 

(ii)  the  set  of  backedge  targets  of  gl  is  a  subset  of  the 
backedge  targets  of  g. 

Next  we  show  that 

(iii)  the  backedge  targets  of  gl  are  the  backedge  targets  of 
g  less  X. 

Finally,  suppose  we  consider  the  back  edges  of  gl  with 
respect  to  t  and  with  respect  to  tl.   Then 

(iv)  the  back  edges  of  gl  with  respect  to  t 1  are  back  edges 
with  respect  to  t. 

LEMMA   5 . 2 
(LEMMA  6.) 

n(x)  =  max  /  n [targ-back-edges (t ,  g) ]   & 
i  =  reachunder(g,  t,  x)  +  {x>  &  r  ~in  i 
&  gl  =  collapse(g,  i,  x)  & 
[t,  n]  in  depth- first-trees(g,  r)  & 
[tl,  nl]  =  collapse-tree(t ,  n,  i,  x) 

-> 
(i)        backedge([v,    xl] ,    tl)    &    [v,    xl]    in  gl 

-> 
backedge([v,    xl],    t)      &       [v,    xl]    in   g 

or 
v=x&Euini    |     [u,    xl]    in   g    &   backedge([u,    xl],    t) 

(ii)   targ-back-edge (t 1 ,  gl)  subset  targ-back-edge(t ,  g) 

(iii)  targ-back-edges (t 1 ,  gl)  =  targ-back-edge (t ,  g)  -  {x} 
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(iv)  {e  in  gl  |  back.edge(e,  tl)}  =  {e  in  gl  |  backedge(e,  t)} 

PROOF  OF  (i). 

Suppose 

backedge([v,  xl] ,  tl)  &  [v,  xl]  in  gl 
Then  by  definition  of  collapse, 

[v,  xl]  in  g  or  V  =  X  &  E  u  in  i  |  [u,  xl]  in  g 
Case  I.   [v,  xl]  in  g 
Since  [v,  xl]  is  a  backedge  in  tl,  by  definition  of  backedge, 

haspath(tl,  xl,  v) 
By  Lemma  2. A, 

haspath(t,  xl,  v). 
Therefore, 

backedge ([v,  xl] ,  t) 

Case   II.      v=x&Euini    |     [u,    xl]    in  g 

Since  u  is  in  i  and  i  is  a  strong  interval  with  head  x,  we  apply 
Lemma  4.2  to  obtain 

haspath(t,  x,  u) 

Since   v   =  x,     [x,    xl]    is   a  backedge   in   tl,    and     by      definition      of 
backedge, 

haspath(tl,    xl,    x) . 
Therefore, 

haspath(t,  xl,  x) 
Since  haspath  is  transitive, 

haspath(t ,  xl ,  u) , 
so  by  definition  of  backedge, 

backedge ([u,  xl] ,  t) 
QED. 
PROOF  OF  (ii). 
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This  follows  immediately  from  (i) .  If  xl  is  a  back  edge  target 
in  gl  with  respect  to  tl,  then 

E  V  in  nodes(gl)  |  [v,  xl]  in  gl  &  backedge([v,  xl] ,  tl) 

By  (i), 

E   u  in  nodes(g)     |     [u,    xl]    in  gl    &   backedge([u,    xl] ,    t) 

QED. 

PROOF  OF    (iii). 

We  must  show  that  x  is  not  a  back  edge  target  of  gl  with  respect 
to  tl,  and  if  xl  ""=  x  is  a  back  edge  target  of  g  with  respect  to 
t,  then  xl  is  a  back  edge  target  of  gl  with  respect  to  tl. 

First,  suppose 

backedge([u,    xl],    t)    &    [u,    xl]    in   g 
for   some   xl    '^=  x. 
We   claim   that 

xl    "in   i. 

Otherwise,  xl  is  a  descendant  of  x,  and  therefore  has  a  higher 
node  number.   So  by  definition  of  collapse, 

[u,  xl]  in  gl  or  [x,  xl]  in  gl. 
Since  [u,  xl]  is  a  backedge  of  t, 

haspath(t ,  xl ,  u) , 
so 

haspath(tl,  xl,  u)  or  haspath(tl,  xl,  x) 
which  implies 

backedge ([u,  xl] ,  tl). 
Next,  we  show  that  x  is  not  a  backedge  target  in  gl . 
Suppose 

E  u  in  nodes(gl)  |  backedge([u,  x] ,  tl)  &  [u,  x]  in  gl 
Then 

backedge([u,  x] ,  t)  &  [u,  x]  in  g 
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But  since  u  in  nodes (gl), 
u  ""in  i- 

Since  [u,  x]  is  a  backedge,  u  in  reachunder(g,  t,  x) .   Therefore, 

u  in  i, 
which  is  a  contradiction- 
PROOF  OF  (iv). 
Suppose 

e  in  gl  &  backedge (e,  tl) 
Then 

haspath(tl,  e(2),  e(l)). 
By  Lemma  2.4, 

haspath(t,  e(2),  e(l)). 
Therefore, 

backedge (t,  e) 
by  definition.   Similarly, 

e  in  gl  &  backedge (t,  e) 
implies 

backedge (tl,  e) 
by  Lemma  2.5. 
QED. 
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5.2.6  Tarjan's  Reducibllity  Criterion 


Our  next  goal  is  to  prove  that  Tarjan's  Reduciblity 
Criterion  is  equivalent  to  our  definition  of  reduciblity.  We 
first  introduce  a  third  characterization  of  reducibllity,  which 
we  will  refer  to  as  the  Hecht-Ullraan  criterion.  Ijg  will  then 
show  that  the  three  reducibllity  criteria  are  equivalent  by 
showing  that 

(i)  Hecht-Ullman  criterion  ->  ""reducibleCg,  r)  by  definition 
(ii)  Tarjan's  criterion  ->  Hecht-Ullman  criterion 
(iii)  ~reducible(g,  r)  by  definition  ->  Tarjan's  criterion 
The  Hecht-Ullman  criterion  states  precisely   the   conditions 
under  which   a  flowgraph  g  has  a  double  entry  loop  ;   namely,  if 
there  are  two  nodes  xl  and  x2 ,  and  four  paths  -  pi  from  r  to   xl, 
p2   from  r   to  x2,  p3  from  xl  to  x2,  and  p4  from  x2  to  xl  -  such 
that  x2  is  not  in  pi,  xl  is  not  in  p2,  and  paths  pi  and  p2  do  not 
intersect   paths   p3   and   p4,   except   at  nodes  xl  and  x2.   More 
formally,  we  state  the  following  definition: 

hecht-ullman(g,  r)   <-> 

(E  xl  in  nodes(g),  x2  in  nodes(g),  pi  in  paths(g,  r,  xl), 

p2  in  paths(g,  r,  x2),  p3  in  paths(g,  xl,  x2), 

p4  in  pathsCg,  x2 ,  xl)   | 

x2  "in  range  pi   &  xl  "in  range  p2   & 
(range  pi  +  range  p2)   *   (range  p3  +  range  p4)  =  {xl,  x2 }  ) 

The  first  lemma  below  states  that  if  xl  and  x2   are   nodes   of  g 

which   satisfy   the  Hecht-Ullman  criterion,  and  if  i  is  a  strong 

interval  with  head  x,  then  xl  and  x2  are  not  in  the  set  i  -  {x}. 
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LEMMA   6.1 

pi  in  paths(g,  r,  xl)   &  p2  in  paths(g,  r,  x2)   & 

p3  in  pathsCg,  xl,  x2)   &  p4  in  patlis(g,  x2 ,  xl)   & 

x2  ~in  range  pi   &   xl  ~in  range  p2   & 

(range  pi  +  range  p2)   *  (range  p3  +  range  pA)   =  {xl,  x2}  & 

strong-lnterval(i,  g,  x) 

-> 

xl  "in  i  -  {x}  &  x2  ~in  i  -  {x} 

PROOF. 

[We  assume  xl  is  in  i  -  {x}.   pi  therefore  contains  the   interval 

head   x.    It  follows  that  x2  is  in  i  -  {x}.   Otherwise,  the  path 

pA  from  x2  to  xl  intersects   i   at   a   point   which   is   not   the 

interval  head,   since   p4   and  pi  intersect  only  at  xl.   We  then 

show  that  range  p3  +  range  p4  is  a  subset  of  i  -  {x}  by  a  similar 

argument.   We  can  therefore  construct  a  cycle  in  i  which  does  not 

contain  the  interval  head.   Therefore,  xl  is   not   in   i   -   {x}. 

Similarly,  x2  is  not  in  i  -  {x}.] 

Suppose 

xl  in  i  -  {x} 
Then  we  claim  that 

(1)  x2  in  i  -  {x} 
We  prove  (1)  as  follows. 
Suppose 

x2  "in  i  -  {x}. 
Then, 

(E  1  <  k  <  #p4  I  p4(k)  ~in  i  &  p4(k+l)  in  i) 
Since  1  is  an  interval, 

p4(k+l)  =  X. 
Also, 
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(E  1  <  j  <  #pl  I  pl(j)  ~in  i  &  pKj  +  l)  in  i) 
Again,  since  i  is  an  interval, 

pl(j+l)  =  X. 
Therefore, 

X  in  range  pi  *  range  p4 
However,  by  hypothesis, 

(2)  range  pi  *  (range  p3  +  range  p4)  =  {xl} 
But,  xl  ~=  x.   Therefore,  (1)  holds. 

Next,  we  claim 

(3)  range  p3  +  range  p4  subset  i  -  {x}. 
Note  that  since 

X  in  range  pi  &  x  ~=  xl  &  x  "=  x2 
we  have 

(4)  X  "in  range  p3  +  range  p4 

Suppose  that  (3)  is  false.   Then 

(E    1    <  k    <    //p3    I    p3(k)    ~in   i-{x}   &   p3(k+l)    in   i-{x})    or 
(E    1    <   j    <   //p4    I    p4(j)    ''in   i-{x}   &   p4(j  +  l)    in  i-{x}) 

By    (4),    this    is   equivalent    to: 

(E    1    <  k    <   //p3    I    p3(k)    ~in   i      &   p3(k+l)    in  i   -    {x}) 

or 
(E    1    <  j    <    //p4    I    p4(j)    -in   i      &   p4(j  +  l)    in   i   -    {x}) 

Therefore,    since   i    is   an   interval,   we   conclude 

p3(k+l)    =  X   or   p4(j+l)    =  X. 

However   this   contradicts   (4).    Therefore,   (3)   holds.    Next 
consider 

p  =  p3(l  :   //p3  -  1)  I  I  p4. 
p  is  a  path  from  xl  to  xl,  so 

(E  p'  in  paths(g,xl,xl)  | 

simple(p')  &  range  p'  subset  range  p) 
.s 

Therefore, 
.lit 
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E  p'  I  iscycle(p' ,g)   &  range  p'  subset  i  &  x  ~in  range  p'. 

Therefore, 

"strong-interval (i,  g,  x) . 

We  have  shown  that  xl  "in  i-{x},  and  by  the  same  argument,  we  can 
show  that 

x2  ~in  i  -  {x}. 

QED. 

The  next  lemma,  which  states  that  if  g  has  a  hecht-ullman 
configuration,  and  gl  is  a  collapsed  graph  of  g,  then  gl  has  a 
hecht-ullman  configuration. 

LEMMA  6.2 

hecht-ullman (g,  r)  &  strong-interval(i,  g,  x)  & 
gl  =  collapse(x,  g,  i) 

-> 
hecht-ullman (g 1 ,  r) 

PROOF. 

By  the  previous  lemma 

xl  ~in  i  -  {x}  &  x2  ~in  1  -  {x}. 

By  Lemma  2.1, 
r  "in  i. 

Therefore,  by  Lemma  2.5, 

E  pi'  in  paths(gl,  r,  xl)  |  range  pi'  subset  range  pi 
E  p2'  i  paths (gl,  r,  x2)  |  range  p2'  subset  range  p2 
E  p3'  in  paths (gl,  xl,  x2)  |  range  p3'  subset  range  p3 
E  p4'  in  paths(gl,  x2,  xl)  |  range  p4'  subset  range  p4 

It  follows  that 

(range  pi'  +  range  p2')  *  (range  p3'  +  range  p4')  = 
{xl,  x2} 


and 
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xl  ~in  range  p2'  &  x2  ~in  range  pl'« 
Therefore 

hecht-ullman(gl ,  r) . 

QED. 

We  next  prove  that  if  gl  has  the  hecht-ullman  configuration,  then 
g  does  also. 

LEMMA  6.3 

stroag-interval(i,  g,  x)  &  gl  =  collapse(g,  i,  x)  & 
hecht-ullman(gl ,  r) 
-> 

hecht-ullman(g,  r) 

PROOF. 

From  Lemma  2.6,  we  know  that 

E  pi'  in  paths(g,  r,  xl)  |  range  pi  =  range  pi'  -  (i-{x}) 
E  p2'  in  paths(g,  r,  x2)  |  range  p2  =  range  p2'  -  (i-{x}) 
E  p3'  in  pathsCg,  xl,  x2)  |  range  p3  =  range  p3'  -  (i-{x}) 
E  p4'  in  paths(g,  x2,  xl)  |  range  p4  =  range  pA'  -  (i-{x}) 
Suppose 

xl  in  range  p2' . 
Since  xl  in  nodes(gl),  xl  "in  i  -  {x}.   Therefore, 

xl  in  range  p2. 
This  contradicts  the  hypothesis.   Therefore, 

xl  ~in  range  p2'. 
By  a  similar  argument,  we  can  show  that 

x2  ~in  range  pi ' • 
Next,  suppose 
E  z  in  range  pi'  *  (range  p3'+range  p4')  |  z  ~=  xl  &  z  ~=  x2. 

z  ~in  i 
implies  that 


I 
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z  in  range  pi  *  (range  p3  +  range  p4) 
which  contradicts  the  hypothesis.   Therefore, 

z  in  i . 
Furthermore, 

z  in  range  pi'  ->  x  in  range  pi' 
since  intervals  are  single  entry.   Since  x  in  range  pi', 

X  in  range  pi. 
By  a  similar  argument, 

X  in  (range  p3  +  range  pA). 
This  implies  that 

X  =  xl. 

We  can  assume  without  loss   of  generality   that   pi   is   simple. 
Therefore, 

range  pi'  *  (range  p3'  +  range  p4')  =  {xl}. 
By  a  similar  argument,  we  show  that 

range  p2'  *  (range  p3'  +  range  p4')  =  {x2}. 
It  follows  that 

hecht-ullman(g,  r). 

QED. 

LEMMA  6.4 

hecht-ullman(g,  r)  ->  ~reducible(g,  r) 

PROOF. 

We  prove  this  lemma  by  induction  on   /i'nodes(g),   using   induction 
schema  (TH6). 

Suppose 

hecht-ullman(g,  r)  &  reducible(g,  r) 

and 

A  gl    I    flowgraph(gl,    r)    &    #nodes(gl)    <  //nodes(g) 

&  hecht-ullman(gl,  r)   -> 
~reducible(gl,  r) 
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Since  hecht-ullman(g,  r) ,  we  can  concatenate  the  paths  p3  and   p4 
to  produce  a  path  in  g  from  xl  to  xl.   Let 

p  =  p3(l://p3-l)  I  I  p4. 
Then 

p  in  paths (g,  xl,  xl).   We  apply  Lerama  1.6  to  obtain 

E  p'  in  paths(g,  xl,  xl)  |  iscycle(p',  g) 

Therefore,    g   is   not    reduced.      But   since   g   is   reducible, 

E   i,    X    I    strong-interval(i,    g,    x)      & 

reducible(collapse(g,    i,    x) ,    r) 

Let 

gl  =  collapseCg,  i,  x) 
By  Lemma  6.2, 

hecht-ullraan(gl,  r) . 
But,  by  assumption 

~reducible(gl ,  r) 
so 

"reducible (g,  r). 

QED. 

LEMMA   6.5 

(E  X  in  nodes(g)  -  {r},  [t,  m]  in  depth- first-trees(g,  r)   | 

r  in  reachunder (g,  t,  x))    -> 
hecht-ullman(g,  r) 

PROOF. 

Let 

xl  =  arb  {x  in  nodes(g)  -  {r}   | 

E  [t,  m]  in  depth-f irst-trees (g,  r)  | 
r  in  reachunder(g,  t,  x)  } 

and  lt>t 

[t,    m]    =  arb   {[t,    m]    in  depth-first-t rees (g,    r)       | 
r   in    reachunder(g,    t,    x)      }• 

Then   r   in   reachunder(g,    t,    xl)    implies 


I 
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E  z  I  backedge([z,  xl] ,  t)   & 

haspath({  [x2,  xl]  in  g-inv  |  xl  ""=  xl},  z,  r) 

This  is  equivalent  to 

E   z    I    backedge([z,    xl] ,    t)      & 

haspath({ [xl ,  x2]  in  g  |  xl  ~=  xl},  r,  z). 

Let  z  be  such  a  node,  and  pO  a  path  from  r  to  z  in  the  graph 

{[xl,  x2]  in  g  I  xl  ~=  xl}. 
That  is,  let 

pO  =  arb  paths({[xl,  x2]  in  g  |  xl  ~=  xl},  r,  z). 
Note  that  since 

backedge( [z,  xl] ,  t), 
we  know  that 

haspath(t,  xl,  z). 
Let 

pt  =  arb  paths(t,  xl,  z), 
and  let 

s  =  range  pt  *  range  pO. 
Note  that  since 

z  in  range  pt  &  z  in  range  pO, 

s  ~=  {}. 

We  next  choose  x2  to  be  the  first  element  on  the  path  pO  which 
belongs  to  s.   Let 

x2  =  arb  {y  in  s  |  pO-inv(y)  = 

min  /  {pO-inv(z)  :  z  in  s}}. 

We  now  specify  paths  pi,  p2,  p3,  and  p4.   Let 

pi  =  arb  paths(t,  r,  xl). 
That  is,  pi  is  a  path  from  the  root  of  t  to  xl.   Let 

p2  =  pO(l  :  pO-inv(x2)). 
That  is,  p2  is  the  initial  segment  of  pO  from   the   root   to  x2. 
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Let 

p3  =  pt(l  :   pt-inv(x2)). 

P3  is  the  initial  segment  of  pt  from  xl  to  x2.   Finally,  let 

p4  =  pt(pt-inv(x2)  :   #pt)  ||  [xl]  . 

P4  is  a  path  from  x2  to  xl,  consisting  of  the  final  segment  of  pt 
plus  xl.   It  is  not  hard  to  show  that  p4  is  a  path  in  g.   We 

next  must  show  that  the  disjointedness  properties  hold. 
Claim  1.   x2  ~in  range  pi. 
By  definition  of  s  and  x2, 

x2  in  range  pt. 
Therefore,  since  pt  is  a  tree  path, 

haspath(t,  xl,  x2). 
Next, 

x2  in  range  pi  ->  haspath(t,  x2 ,  xl). 

Since  xl  ~=  x2,  and  haspath  defines  a  partial  ordering,  we  obtain 
a  contradiction. 

Claim  2.   xl  "in  range  p2. 

By  definition  of  reachunder, 

xl  ~in  range  pO. 
By  definition  of  p2 , 

range  p2  subset  range  pO. 
Therefore, 

xl  ~in  range  p2. 
Claim  3.   range  pi  *  (range  p3  +  range  p4)  =  {xl} 

xl  =  pi  (//pi)  =  p3(l). 
Therefore, 

xl  in  range  pi  *  (range  p3  +  range  p4). 
Consider 
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n  =  arb  {n  in  range  (p3  +  p4)  |  n  ~=  xl}. 
Then 

haspath(t,  xl,  n) . 
But, 

n  in  range  pi  ->  haspath(t,  n,  xl) 
which  gives  a  contradiction. 
Claim  4.   range  p2  *  (range  p3  +  range  p4)  =  {x2} 

x2  =  p2(//p2)  =  p3(#p3)  -> 

x2  in  range  p2  *  (range  p3  +  range  p4) 
Because  p2  is  an  initial  segment  of  pO  up  to  x2 , 

(1)  (A  n  in  range  p2  |  pO-inv(n)  <  pO-inv(x2)) 
Suppose 

n  in  range  p2  *  (range  p3  +  range  p4) , 
and 

n  "=  x2. 
We  know  that 

n  "=  xl 
since 

xl  ~in  range  p2. 

range  p3  +  range  p4  =  range  pt, 
so  we  conclude 

hecht-ulltnan(g,    r). 
QED. 

LEMMA     6.6 

E  [t,  n]  in  depth- first-trees (g,  r)  |  A  x  in  nodes(g)  -  {r}  | 

r  ~in  reachunder(g,  t,  x) 

-> 
reduclble(g,  r) 
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PROOF   . 

We  use  the  induction  schema  (TH6).   Suppose  g  is  a  flowgraph  with 

root  r  which  satisfies  the  hypothesis  of  the  lemma,  such  that 

~reducible(g,  r)  &  A  g'  |  nodes(g')  <  nodes(g)  -> 
E[t,n]  in  depth- first-trees(g' ,r)  |  A  x  in  nodes(g')  -  {r}  | 
r  ~in  reachunder(g' ,  t,  x) 
-> 
reducible(g' ,  r) 

Since  g  is  not  reducible,  g  contains  a  cycle.    Suppose   t   is   a 

depth   first   spanning  tree  which  satisfies  the  hypothesis.   Then 

by  Lemma  3.8,  g  contains  a  backedge  with  respect  to  t.   We  choose 

X  to  be  the  backedge  target  with  the  largest  node  number. 

E  X  in  targ-back-edges(g,  t)  | 

A  y  in  targ-back-edges(g,  t)  |  n(x)  >=  n(y) 

Since  r  ~in  reachunder (g,  t,  x) , 

strong-interval(reachunder(g,  t,  x)  +  {x},  g,  x) 
Let 

i  =  reachunder (g,  t,  x)  +  {x}, 
and 

gl  =  collapseCg,  1,  x) . 
Since  g  is  not  reducible,  gl  is  also  not  reducible.   We  also  let 

[tl,  nl]  =  collapse-tree(t ,  n,  i,  x) 
By  lemma  5.1, 

[tl,  nl]  in  depth- first-trees(g,  r) 
We  show 

A  X  in  nodes(gl)  |  r  ~in  reachunder (gl ,  tl,  x) 
Suppose 

r  in  reachunder (gl ,  tl,  xl) 
for  some  xl  in  nodes (gl).   Then 
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E  V  in  nodes(gl)  |  [v,  xl]  in  gl  &  backedge([v,  xl] ,  tl) 

By  Lemma  5.2  (i) , 

backedge([v,    xl] ,    t)    &    [v,    xl]    in  g 
or 
v=x&Euini    |     [u ,    xl]    in  g   &  backedge([u,    xl] ,    t) 

Since  r  in  reachunder(gl ,  tl,  xl ) , 

E  pi  in  {p  in  paths(gl,  r,  v)  |  xl  "in  range  p} 

Case  I.   backedge([v,  xl] ,  t)  &  [v,  xl]  in  g 

From  Lemma  2.4 

E  p  in  (p'  in  paths(g,r,v)  I  range  p'  subset  range  pi  +  i} 

Since  xl  is  a  bakcedge  target, 
Therefore, 

xl  "in  range  p, 
so 

r  in  reachunder (g,  t,  xl) 
This  contradicts  the  assumption  of  the  lemma. 

Case  II.   v=x&Euini  |  [u,  xl]  in  g  &  backedge([u,  xl] ,  t) 
There  is  a  paths  from  x  to  u  that  does  not  contain  xl,  since  u  in 
i  and  xl  "in  i,  and  i  is  a  strong  interval.   Let  p  be  a  path  from 
r  to  X,  as  in  case  I,  and  let 

pxu  =  arb  {paths(g,  x,  u)  |  xl  "in  range  pxu. 
Then  let 

pru  =  p(l :  //p-1)  I  I  pxu 

xl  "in  range  pru. 
Since  [u,  xl]  is  a  backedge, 

r  in  reachunder(g,  t,  xl). 
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Contradiction. 
QED. 

We  now  state  Lemma  3  from  section  5.1.3. 

LEMMA  6.7 
(LEMMA  3.) 

~reducible(g,  r)  <-> 

(E  X  in  nodes(g)  -  {r},  [t,  n]  in  depth- first-trees (g,  r)  | 
r  in  reachunder (g,  t,  x)) 

PROOF. 

From  Lemmas  6.4,  6.5,  and  6.6,  we  know  that  the  three 
irreducibility  criteria  stated  in  the  beginning  of  this  section 
are  equivalent.   Therefore,  this  lemma  follows. 

QED. 

We  next  prove  Lemma  1  from  section  5.1.1. 

LEMMA  6.8 
(LEMMA  1) 

collapsed(gl ,  g2)  -> 

reducible(gl ,  r)  <->  reducible(g2,  r) 

Suppose 

reducible(g2,  r) 

Then  since  g2  is  obtained  from  gl  by  collapsing  a  strong 
interval,  there  is  a  path  in  derivable (g)  from  gl  to  g2 ,  so 

reducible(gl,  r) . 
Conversely,  suppose 

~reducible(g2,  r) 
Then 

hecht-ullman(g2,  r) 
By  Lemma  6.3, 

hech-ullman(gl ,    r) . 
Therefore, 

"reducible (gl ,    r) . 
QED. 


I 
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5.2.7  Remaining  Lemmas 

In  this  section,  we  prove  the  remaining  lemmas  stated  in  the 
previous  part  of  this  chapter,  which  were  used  in  the 
transformation  steps. 

LEMMA   7 . 

f  :  smap(s)  s   & 
nocycles(f)   -> 
(A  X  in  nodes(f)  I  E  y  in  nodes(f)  |  f-lim(x)  =  y) 

PROOF. 

Let 

X  =  arb  nodes (f) . 

Let 

p  =  arb  {p'  I  p'  :  braap  &  ispath(p',  f)  &  p'(l)  =  x 
&  A  q  I  q  :  bmap  &  ispath(q,  f)  &  q(l)  =  x  -> 
#q  <=  //p'} 

That  is,  p  is  the  longest  path  in  f  beginning  from  x,   such   that 
all  nodes  in  the  path  are  distinct.   Consider 

p(//p) 
Clearly, 

haspath(f,  x,  p(//p)). 
Suppose 

p(#p)  in  dom  f. 
Then 

E  u  I  [p(#p),  u]  in  f. 

If  u  ~in  range  p,  then  we  can  construct  a  path  which   is   longer 
than  p  which  is  one-one.   Therefore, 

u  in  range  p. 
Consider  the  path 

pi  =  p(p-inv(u)  :   #p)  ||  [u]  . 
Then  pi  is  a  path  in  f.   Moreover,  pi  is  a  cycle.   Therefore, 

p(#p)  ~in  dom  f. 
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so  p(//p)  in  liraitset(f,  x) . 
QED. 


LEMMA  8. 

head  :  smap (nodes (gO) )  nodes (gO)  &  nocycles (head)   & 
g  =  { [head-lim(u) ,  v]  :  [u,  v]  in  gO  |  v  ~in  dom  head  & 

head-lim(u)  ""=  v}   & 
strong-interval(i,  g,  x)   &  gl  =  collapse(g,  i,  x)   & 
dom  head  *  nodes (g)  =  {}   & 
headl  =  head  +  {[w,  x]  :  w  in  i  -  {x}} 

-> 
gl  =  {  [headl-lim(u) ,  v]  :  [u,  v]  in  gO  |  v  ""in  dom  headl   & 

headl-lim(u)  ~=  v} 

PROOF. 

By  definition  of  collapse, 

gl  =  {[u,  v]  in  g  I  u  ""in  i}  + 

{[x,  v]  :  [u,  v]  in  g  I  u  in  i  &  V  ~in  i} 

By  hypothesis,  gl  is  therefore  equivalent  to 

{[u,  v]  in  {  [head-lim(z) ,  w]  :  [z,  w]  in  gO  |  w  ""in  dom  head 
&  head-lira(z)  ~=  w}  |  u  ~in  i}  + 

{ [x,  v]  :  [u,  v]  in  { [head-lim(z) ,  w]  :  [z,  w]  in  gO  | 
w  "in  dom  head  &  head-lira(z)  ~=  w}  |  u  in  i  &  v  ~in  i} 

We  simplify  this  expression  to  obtain: 

(1)   { [head-lim(z) ,  w]  :  [z,  w]  in  gO  |  w  "in  dom  head  & 
head-lim(z)  "=  w  &  head-lim(z)  "in  i}   + 

{[x,  w]  :  [z,  w]  in  gO  I  w  "in  dom  head  &  head-lim(z)  ~=  w 
&  head-lim(z)  in  i  &  w  ~in  i} 

By  hypothesis,  we  know  that 

dom  headl  =  dom  head  +  i  -  {x}, 

w  in  dom  headl  &  [z,  w]  in  gO  &  w  ~in  dom  head 
-> 
z  =  X 

head-lim(z)  in  i  <->  headl-lim(z)  =  x 
and 

head-lim(z)  "in  i  <->  head-lim(z)  =  headl-lim(z) 
Therefore,  (1)  is  equivalent  to 


I 
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{ [headl-lim(z ) ,    w]    :     [z,    w]    in  gO    |    w   ~in   dom  headl    & 
headl-lim(z)    "=  w   &   headl-lira(z)    ~=  x}     + 

{  [headl-lira(z) ,    w]    :     [z,    w]    in  gO    |    w   ~in   dom  headl      £< 
headl-lim(z)    ~=  w   &   headl-lim(z)    =  x} 

The   above    formula   simplifies    to: 

gl    =   { [headl-lim(z) ,    w]    :     [z,    w]    in   gO    |    w   "in   dom  headl      & 
head-lim(z)    ~=  w} 

QED. 

This  completes  the  proof  of  the  lemmas  stated  in  section  5.1.   In 

the   next   section  we  verify  several  lemmas  in  greater  detail  to 

estimate  the  total  number  of  proof  steps- 


5.2.8  Estimate  of  Proof  Checker  Steps 

In  order  to  estimate  the  number  of  proof  checker  steps 
required  to  fully  verify  Tarjan's  algorithm,  we  expand  the  proofs 
of  several  lemmas  from  the  preceding  sections  by  supplying  a 
sequence  of  hypothetical  proof  checker  steps.  We  assume  the 
proof  checker  capabilities  described  in  section  3.4. 

LEMMA  2.2 

A  [u,  v]  in  g  I  (u  ""in  i  ->  (v  "in  (i  -  {x})))  & 
&  gl  =  collapse(g,  i,  x) 

-> 
nodes(gl)  =  nodes(g)  -  (i  -  {x}) 

PROOF. 

We  first  show  that 

dom  gl  +  range  gl  subset  dom  g  +  range  g  -  (i  -  {x}) 
and  then  show  that 
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dom  g  +  range  g  -  (i  -  {x})  subset  dom  gl  +  range  gl 
Assume  : 


(1)  y  in  dora  gl  +  range  gl 

(2)  gl    =   { [u,    v]    :    u,    V,    e    1    e   in  g   & 

e   =    [u,    v]    &   u   "in   i}     + 
{[x,v]     :    u,    V,    e    |    e   in  g   &   e   =    [u,    v]    &   u   in   i   &  v   "in   i} 

(3)  y   ~in   dom  g  +  range   g  -    (i   -   {x}) 

(We  assume  that  the  lemma  is  false,  and  derive 
a  contradiction.) 

(4)  X  in  dora  g  +  range  g  -  (i  -  {x}) 

(5)  A  e   in  g,u,v    |    e  =    [u,v]    ->    (u   "in   i   ->    (v   "in    (i   -    {x}))) 
The    following    formula   is   decidable: 

(6)  A  =  B  +  C  ->  dom  A  +  range  A  =  dom  B  +  range  B  +  dom  C  + 
range  C 

We  replace  gl  in  (1)  by  its  definition  in  (2), 
blob  formula  (1),  and  apply  (6): 

(7)  dom  gl  +  range  gl  = 

{u  :  u,  V,  e  I  e  in  g  &  e  =  [u,  v]  &  u  "in  1}  + 

{v  :  u,  V,  e  I  e  in  g  &  e  =  [u,  v]  &  u  "in  i}  + 

{x  :  u,  V,  e  I  e  in  g  &  e  =  [u,  v]  &  u  in  i  &  v  "in  i}  + 

{v  :  u,  V,  e  I  e  in  g  &  e  =  [u,  v]  &  u  in  i  &  v  "in  i} 

We  normalize  (7)  and  reduce: 

(8)  (7)  = 

{z  :  u,  V,  e  I  e  in  g  &  e  =  [u,  v]  &  u  "in  i  &  z  =  u 

or 
e  in  g  &  e  =  [u,  v]  &  u  "in  i  &  z  =  v 
or 
e  in  g  &  e  =  [u,  v]  &  u  in  i  &  v  "in  i  &  x  =  z 

or 
e  in  g  &  e  =  [u,  v]  &  u  in  i  &  v  "in  i  &  v  =  z} 

Substituting  (8)  for  the  right  side  of  (1),  we  reduce  (1): 

(9)  E  e  in  g,  u,  v  |  [u,  v]  =  e  &  u  "in  i  &  y  =  u 

or 
[u,  v]  =  e  &  u  "in  i  &  y  =  v 

or 
[u,    v]    =e&uini&v   "in   i   &   x   =  y 

or 
[u,    v]    =e&uini&v    "in   i    &   v    =  y 


I 
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We  next  perform  similar  reductions  on  the  expression  dom  g  + 
range  g: 

(10)  dom  g  +  range  g  = 

{z  :  u,  V,  e  I  e  in  g  &  e  =  [u,  v]  &  z  =  u 

or 
e  in  g  &  e  =  [u,  v]  &  z  =  v} 
The  following  formula  is  decidable: 

(11)  A  =  B  -  C   &  y  ~in  A  -> 

(y  in  B  ->  y  in  C) 

We  apply  (11)  to  (3)  and  obtain: 

(12)  y  in  dom  g  +  range  g   ->  y  in  (i  -  {x}) 

We  then  replace  dom  g  +  range  g  in  (12)  by  (10).   We  reduce  to 
obtain: 

(13)  E  e  in  g,  u,  V  I  e  =  [u,  v]  &  u  =  y  or 

e  =  [u,  v]  &  V  =  y 
-> 
y  in  (i  -  {x}) 

We  transform  (13)  into  conjunctive  normal  form  : 

(14)  A  e  in  g,  u,  V  I  ([u,  v]  =  e  &  u  =  y)  or  ([u,  v]  =  e  &  v 

y) 

-> 

y  in  (i  -  {x}) 


We  are  now  done  since  (3),  (4),  (5),  (9),  and  (14)  yield  a 
contraction  by  a  trivial  instantiation  and  applying  a  decision 
algorithm  to  the  ground  formula. 

We  next  prove  the  converse.   We  add  the  additional 
assumption: 

(15)  E  u  I  [u,  x]  in  g  &  u  ~in  i 
It  follows  from  (2)  that 

(16)  X  in  range  gl 

It  follows  immediately  that 

(17)  X  in  dom  gl  +  range  gl 

We  delete  (1)  and  (3)  and  assume: 

(18)  y  in  dom  g  +  range  g  -  (i  -  {x}) 
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(19)  y  "in  dom  gl  +  range  gl 
From  (18)  we  obtain 

(20)  y  in  dom  g  +  range  g 
and 

(21)  y  ~in  (i  -  {x}) 

We  apply  reduction  to  (20)  to  obtain 

(22)  E  e,  u,  V  I  e  =  [u,  v]  &  u  =  y  or  e  =  [u,  v]  &  v  =  y 

I f  we  reduce  (19),  using  the  previous  expansion  of  dom  gl  +  range 

gl. 
we  obtain 

(23)  A  e,  u,  V  I  e  =  [u,  v]   -> 

(u  "in  i  ->  y  "■=  u)  & 

(u  ""in  i  ->  y  ""=  v)  & 

(u  in  i  &  V  "in  i   ->  x  "=  y)   & 

(u  in  i  &  V  "in  i   ->  v  ~=  y) 

We  can  automatically  obtain  a  contradiction  from  (17),  (19), 

(21), 
(22),    and    (23). 

QED. 

LEMMA   2.4 

Define 

(Dl)  pathinv(v)  =  arb  {p'  in  paths(g,  x,  v)  | 
range  p'(l  :  //p'-l)  subset  i} 

(D2)    defpath(q)    =   if   //q    =    1    then   q    elseif 
[q(#q-l),    q(//q)]    ~in   g    then 

defpath(q(l://q-2))     ||    pathinv(q  (//q) ) 
else 

defpath(q(l:     //q-1))     ||     [q(//q)] 
end 

Lemma  2.4  states: 

strong-interval(i,  g,  x)   & 
ispath(pl,  gl)   &   p  =  defpath(pl) 

-> 
p  in  patlis(g,  pl(l),  pl(//pl))   &   range  pi  =  range  p  -  (i  -  {x}) 

PROOF. 
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Assume: 

(1)  strong-interval(i,  g,  x) 

(2)  ispath(pl,    gl) 

(3)  p   =  defpath(pl) 

We  substitute  the  lemma  l.h   in  the  induction  schema  (TH6) 
obtaining: 

(4)  A  pi,  p  I  text  of  lemma  2.4   or 

E  pi,  p  I  not  text  of  lemma  2.4  & 

A  pi'     I    #pl'    <   //pi      ->      text    of   lemma   2.4(pl    \   pi') 

We   assume: 

(5)  not    text    of   lemma   2.4      & 

A   pi'     I    //pi'    <   #pl      ->      text   of   lemma   2.4(pl    \   pi') 

and   derive   a   contradiction    for   the   three   cases   of  defpath(pl). 

Case   I. 

Assume 

(6)  //pi    =   1 
Then 

(7)  p    =  pi 

By   definition   of   ispath   and    (2)   we  have 

(8)  pi    :    tuple(nodes(gl) ) 
By  Lemma    2.2, 

(9)  nodes(gl)    =  nodes(g)    -    (i   -    {x}) 

It  is  therefore  decidable  from  (8)  and  (9)  that 

(10)  range  pi  =  range  pi  -  (i  -  {x}) 
Furthermore,  it  follows  from  (8)  and  (9)  that 

(11)  pi  :  tuple (nodes (g)). 

Since  //pi  =  1,  it  follows   from  the  definition  of  edges  that 

(12)  edges(pl)  =  {[pl(k),  pl(k+l)]  :  1  <=  k  <  1 }  =  { } 
and  so 
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(13)  edges(pl)  subset  g 
Therefore, 

(14)  p  in  paths(g,  pl(l),  pl(//pl)) 

(10)    and    (14)    contradict    the  assumption    (5). 

Case   II. 

Assume 

(15)  #pl    >    1      &      [pl(#pl-l),    pl(#pl)]    ~in  g 
Then 

(16)  p    =  defpath(pl(l:#pl-2))     ||    pathinv(pl(//pl) ) 
Let 

(17)  pi'    =  pl(l:#pl-2)    &  p2    =  defpath(pl') 
It    follows    that 

(18)  #pl'    <  #pl 

Therefore,    from    (5)   we   obtain   that 

(19)  text    of  lemma   2.4    (pi    \    pi',    p   \   p2) 

From  lemma  1.4  (ii)  and  the  definition  (17)  we  can  conclude 

(20)  ispath(pl',  gl). 

Therefore,  the  conclusion  of  lemma  2.4  holds  for  pi': 

(21)  p2  in  paths(g,  pl'(l),  pl'(#pl'))  &  range  pi'  = 

range  p2  -  (i  -  {x}) 

By  definition  (17)  of  pi', 

(22)  pl'(l)  =  pl(l)   &   pl'(#pl')  =  pl(//pl-2) 
Therefore,  (21)  simplifies  to: 

(23)  p2  in  paths(g,  pl(l),  pl(//pl-2))  &  range  pi'  = 

range  p2  -  (i  -  {x}) 

Next,  let 

(24)  p3  =  pathinv(pl(//pl))  = 

arb  {p'  in  paths(g,  x,  pl(//pl))  | 

range  p'(l  :  #p'-l)  subset  1} 
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We  must  show  that  such  a  path  exists. 
Suppose 

(25)  e  in  gl  &  e  "in  g 

Then    from   the   definition   of  gl, 

(26)  E   el    in  g,    u,    v    |    el    =    [u,    v]    &   u   in   i   &  v   ~in   i      & 

e   =    [x,    v] 

From  the  definition  of  edges, 

(27)  [pl(#pl-l),  pl(#pl)]  in  edges(pl) 
so  by  (3)  and  the  definition  of  ispath, 

(28)  [pl(#pl-l),  pl(#pl)]  in  gl 
By  (26),  (15)  and  (28), 

(29)  E  el  in  g,  u,  v  |  el  =  [u,  v]   &  u  in  i  &  v  ~in  i  & 

[x,  v]  =  [pl(#pl-l),  pl(//pl)] 

We  Instantiate  (29), 

(30)  el  in  g   &  el  =  [u,  v]  &  u  in  i  &  v  ~in  i  & 

[x,  v]  =  [pl(//pl-l),  pl(#pl)] 

and  since  u  in  i,  we  know  by  the 
definition  of  interval, 

(31)  E  pU   in  paths(g,  x,  u)  |  range  p4  subset  i 
We  instantiate  (31): 

(32)  p4  in  paths(g,  x,  u)   &   range  p4  subset  i 
and  let 

(33)  p5  =  p4  11  [pl(#pl)] 

We  claim  by  lemma  1.2  and  statement  (30)  that 

(34)  ispath(p5,  g) 
Furthermore,  by  (30)  and  (33), 

(35)  p5(l)  =  X  &  p5(//p5)  =  pl(//pl) 
so 

(36)  p5  in  paths(g,  x,  pl(//pl)) 
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From  (31),  (33),  and  (36)  we  have 

(37)  E  p'  in  paths(g,  x,  pl(//pl))  |  range  p'(l  :  #p'-l)  subset 
i 

Finally,  we  consider 

(38)  p  =  p2  M  p3 

We  wish  to  show  that  p  satisfies  the  conclusion  of  Lemma  2.4. 
We  first  show  that  p  is  a  path  in  g,  by  considering  the 
edge 

(39)  [p2(//p2),  p3(l)]  =  [pl(//pl-2),  x] 

Since  by  (30)  x  =  pl(#pl-l),  we  have  by  definition  of  edges, 

(40)  [pl(//pl-2),  pl(//pl-l)]  in  edges  (pi) 
and  therefore, 

(41)  [pl(//pl-2),  x]  in  gl. 
We  note  by  (25)  and  (26),  if 

(42)  [pl(//pl-2),  x]  ~in  g, 
then 

(43)  [pl(//pl-2),  x]  =  [x,  x] 

this  contradicts  lemma  2.6(iii).   Therefore,  (42)  is  false  and  we 
conclude : 

(44)  [pl(//pl    -   2),    pl(//pl    -    1)]    in   g 
By   Lemma    1.2, 

(45)  ispath(p,    g) 

From    (23),    (24),    (38),    and    (45), 

(46)  p    in    paths(g,    pl(l),    pl(//pl)). 

Therefore,    (5)    is    false. 

Case   III. 

Assume 

(47)  //pi    >    1      &    [pl(//pl-l),    pl(#pl)]    in  g 


Case  Study:  Tarjan's  Graph  Reducibility  Test  Algorithm  PAGE  5-119 

Then, 

(48)  p  =  defpath(pl(l://pl-l))   ||   [pl(#pl)] 

We    repeat    steps    (17)    -    (23)    above: 
Let 

(49)  pi'    =  pKl    :    //pi   -   1)      &  p2   =  defpath(pl) 

(50)  ^/   pi'    <   //   pi 

so  we  apply  (4)  and  conclude  that  lemma  2.4  holds  for  pi': 

(51)  text  of  lemma  2.4(pl  \  pi'  ,  p  \  p2) 
Again,  as  above, 

(52)  ispath(pl',  gl) 
and  so 

(53)  p2  in  paths(g,  pl'(l),  pl'(//pl'-l))  & 

range  pi'  =  range  p2  -  (i  -  {x}) 

which  simplifies  to 

(54)  p2  in  paths(g,  pl(l),  pl(//pl-l))   & 

range  pi'  =  range  p2  -  (i  -  {x}) 

Then  if  we  consider  p  =  p2  ||  [pl(#pl)],  and  apply  lemma  1.2 
and  (47)  we  obtain 

(55)  p  in  paths(g,  pl(l),  pl(#pl)) 
Finally,  from  (48),  (49),  and  (50), 

(56)  range  pi  =  range  p  -  (i  -  {x}) 

From  (55)  and  (56),  (5)  is  false. 

We  conclude  from  Cases  I  -  III  that  (5)  is  false,  and  so  from 
(4), 

(57)  A  pi,  p  I  text  of  lemma  2.4 
QED. 

LEMMA  6.4 

hecht-ullraan(g,  r)   ->   ''reducible(g,  r) 

PROOF. 
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We  apply  induction  scheme  (TH.6)  to  obtain: 


(1)  A  g  I  hecht-ullman(g,  r)  ->  ''reducible(g,  r)   or 

E  g  I  (hecht-ullman(g,  r)  ->  reducible(g,  r)  & 
A  g'  I  //nodes  (g')  <  //nodes  (g)  -> 
(hecht-ullman(g' ,  r)  ->  "reducible (g' ,  r))) 

We  assume 

(2)  E   g    I     (hecht-ullman(g,    r)    ->   reducible(g,    r)      & 

A  g'     I    //nodes  (g')    <   //nodes  (g)    -> 
(hecht-ullraan(g' ,    r)    ->    ~reducible(g' ,    r))) 

and   instantiate    (2): 

(3)  hecht-ullman(g,    r)    ->   reducible(g,    r)      & 

A  g'     I    //nodes  (g')    <   //nodes  (g)    -> 
(hecht-ullman(g' ,    r)    ->   "reducible (g' ,    r))) 

and   assume: 

(4)  hecht-ullman(g,    r) 

By  definition  of  hecht-ullman, 

(5)  E  xl  in  nodes(g),  x2  in  nodes(g),  pi  in  paths(g,  r,  xl), 
p2  in  paths(g,  r,  x2),  p3  in  pathsCg,  xl,  x2), 

p4  in  pathsCg,  x2 ,  xl)  | 

x2  ~in  range  pi  &  xl  ~in  range  p2   & 

(range  pi  +  range  p2)  *  (range  p3  +  range  p4)  =  {xl,  x2} 

We  instantiate  (5)  : 

(6)  xl  in  nodes(g)  &  x2  in  nodes (g)  &  pi  in  paths(g,  r,  xl), 
p2  in  paths(g,  r,  x2)  &  p3  in  paths(g,  xl,  x2)  & 

p4  in  paths(g,  x2 ,  xl)  & 

x2  ~in  range  pi  &  xl  ~in  range  p2   & 

(range  pi  +  range  p2)  *  (range  p3  +  range  p4)  =  {xl,  x2} 

Let 

(7)  p  =  p3(l:p3-l)  II  p4. 

By  the  definition  of  paths  and  (6), 

(8)  p3(//p3)  =  p4(l) 

Therefore,  we  can  apply  Lemma  1.3  and  obtain: 

(9)  ispath(p,  g) 
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Next,  by  definition  of  paths  and  (6)  and  (7), 

(10)  p(l)  =  xl  &  p(//p)  =  xl 
so 

(11)  p  in  paths(g,  xl,  xl). 
We  claim  from  (7)  that 

(12)  #p  >  1 
Therefore,  from  lemma  1.6, 

(13)  E  p'  in  paths(g,  xl,  xl)  |  cycle(p')  &  range  p'  subset 
range  p 

We  now  return  to  the  original  claim,  and  from  (3)  and  (4) 
obtain: 

(14)   reduclble(g,  r) 

By  the  definition  of  reducible, 

(15)  E  gl  I  cyclefree(gl)   &   haspath(derivable(g) ,  g,  gl) 
We  instantiate  (15): 

(16)  cyclefree (gl)   &  haspath(derivable (g) ,  g,  gl) 

Our  next  goal  is  to  show  that  there  is  a  graph  g'  which  can  be 
obtained  from  g  by  collapsing.   Suppose 

(17)  derlvable(g){g}   =   {} 

We  derive  a  contradiction  of  (17)  as  follows. 

By  definition  of  derivable,  and  by  reduction,  (17)  Implies 

(18)  {g'  I  collapsed(g,  g')}  =  {} 
By  definition  of  haspath  and  (16), 

(19)  E  pg  I  lspath(pg,  derivable(g))  &  pg(l)  =  g  &  pg(/i'pg)  =  gl 
We  Instantiate  (19): 

(20)  lspath(pg,  derlvable(g))  &  pgd)  =  g  &■  pg(#pg)  =  gl 
By  definition  of  Ispath, 

(21)  {[pg(k),  pg(k+l)]  :  1  <=  k  <  #pg}  subset  derlvable(g) 
Suppose 
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(22)  //pg  >=  2 
Then 

(23)  [pg(l),  pg(2)]  in  derivable(g) 
By  (20),  pg(l)  =  g,  and  so 

(42)  [g,  pg(2)]  in  derivable(g) 
However,  (24)  contradicts  (17).   Therefore, 

(25)  //pg   =   1 
But    this    implies 

(26)  pg(l)    =  pg(//pg)    =   g   =   gl 

But  now  from  (13),  (16),  and  (26)  we  obtain  a  contradiction. 
Therefore, 
(18)  is  false  and 

(27)  E  g'  I  collapsed(g,  g' )  &  pg(2)  =  g' 
We  instantiate  (27): 

(28)  collapsed(g,  g')  &  pg(2)  =  g' 

Since  g'  is  in  the  range  of  pg  we  know  from  properties  of  haspath 
that 

(29)  haspath(reducible(g),  g',  gl) 
and  so 

(30)  reducible(g',  r). 
By  definition  of  collapsed, 

(31)  E  i,  X  I  strong-interval(i,  g,  x)  &  g'  =  collapse(g,  i,  x) 
We  instantiate  (31): 

(32)  strong-interval(i,  g,  x)  &  g'  =  collapse(g,  i,  x) 
By  Lemma  6.2, 

(33)  hecht-ullman(g' ,  r) 
By  Lemma  2.3, 

(34)  //nodes  (g')  <  //nodes  (g) 
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By  hypothesis  (4)  , 

(35)  "reducible (g') 

This  contradicts  (30).   Therefore  by  (1), 

(36)  A  g  I  hecht-ullman(g,  r)  ->  ~reducible(g,  r) 

QED. 

From  this  experiment,  we  conclude  that  a  detailed  proof 
checker  verification  expands  the  number  of  steps  by  roughly  a 
factor  of  3.  Therefore,  the  entire  proof  presented  in  section 
5.2  would  require  approximately  2,000  proof  checker  steps. 
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5.3  Proving  Tarjan's  Algorithm  Without  Transformations 

In  order  to  evaluate  the  effectiveness  of  our 
transformational  approach  to  program  verification,  we  compare  our 
development  of  Tarjan's  algorithm  with  a  static  proof  of  our 
final  version  of  the  algorithm.  That  is,  we  will  add  to  the  code 
text  the  loop  invariants  which  are  necessary  to  prove  the  output 
assertion,  and  we  will  produce  the  resultant  verification 
conditions.  Once  verification  conditions  are  generated,  lemmas 
1-8  can  be  used  to  verify  them. 

Even  though  the  final  optimized  form  of  the  algorithm  makes 
no  reference  to  an  explicit  collapsed  graph  sequence  and 
corresponding  depth  first  spanning  trees,  its  proof  requires 
explicit  mention  of  the  existence  of  these  objects.  Moreover, 
the  variables  of  the  program  must  be  mapped  onto  these  implicit 
objects.  Therefore,  supplying  loop  invariants  requires 
reconstructing  previous  versions  of  our  algorithm  derivation. 
Our  contention  is  that  these  formulae  are  unwieldy  and  difficult 
to  manipulate.  We  also  believe  that  it  would  be  extremely  hard 
to  formulate  appropriate  loop  invariants  without  a  deep 
understanding  of  the  roots  of  the  algorithm,  as  detailed  in  our 
transformational  development. 

To  demonstrate  this,  we  will  sketch  the  proof  of  the  version 
of  Tarjan's  algorithm  given  below. 

(1)   proc  testreduce(g)  returns  reduceflag; 
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1=  flowgraph(g,  r)  &  r  "in  range  g   & 
(A  e  in  g  I  e(l)  -=  e(2)) 

(30    [t,  n,  ndescs]  :=  dfst(g); 
nodeset  :=  nodes (g); 
nodevect  :=  {[v,  u]  :  [u,  v]  in  n}; 
backinv  :=  { [v,  u]  :  [u,  v]  in  g  |  n(v)  <=  n(u)   & 

n(u)  <  n(v)  +  ndescs(v)}; 
targbackedges  :=  dom  backinv; 

f coup  : =  { ) ; 

froot  :=  {[z,  z]  :  z  in  nodeset}; 

count  :=  {(z,  1]:  z  in  nodeset}; 

@1   (forall  //nodeset  >=  y  >=  1  |  nodevect  (y)  in  targbackedges) 

X  : =  nodevect (y) ; 

new  :=  {}; 
@2        (forall  V  in  backinv{x}) 
(§3  [fcomp,  h]  :=  findlim(  fcorap,  v) ; 

new  :=  new  with  h; 

end  forall; 

reach  :=  {x}; 
@4       (while  new  ""={}) 
w  from  new; 

reach  :=  reach  with  w; 
@5  [froot,  fconp,  count]  :=  balance(froot ,  fcorap, 

count,  w,  x) ; 
@6  (forall  u  in  g-inv{w}) 

@7  [fcorap,  h]  :=  findlim( fcorap,  u) ; 

if  h  "in  reach  then 

new    :=   new  with   h; 
end   if; 
end    forall; 

end  v;hile ; 

if  r  in  reach  then 
return  false; 

end  if; 
end  forall; 
return  true; 

1=   reduceflag  <->  reduclble(g,  r) 

end  ; 
To  verify  the  praa  (1),  we  must  supply  invariants  for   the   outer 
forall-loop   at  (?!,  the  inner  while  loop  at  @A,  and  the  two  inner 
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forall-loops  at  @2  and  @6.  In  addition  to  proving  these  loop 
invariants,  we  must  also  prove  that  the  input  assumptions  of  the 
functions  dfst  at  @0,  findlim  at  @3  and  &1 ,  and  balance  at  t?5  are 
satisfied  before  the  calls  are  made.  We  construct  the  main  outer 
loop  invariant  LOOPINVl  as  follows. 

The  first  part  of  LOOPINVl  reflects  the  initialization  code. 

(PI)   flowgraph(g,  r)   &   r  "in  range  g   & 
(A  e  in  g  |  e(l)  -=  e(2))   & 
[t,  n]  in  depth- first-trees(g,  r)   & 
nodeset  =  nodes<g)   & 
nodevect  =  {[v,  u]  :  [u,  v]  in  n}   & 

backinv  =  { [v,  u]  :  [u,  v]  in  g  |  backedge ( [u,  v] ,  t)} 
&  targbackedges  =  dom  backinv 

Since  the  free  variables  of  (PI)   are   not   modified  within   the 

loop,  it  is  easy  to  show  that  (Pi)  is  invariant. 

We  next  formulate  an  invariant  which  can  be  used  to  prove 
the  output  assertion.  Suppose  first  that  the  loop  terminates 
normally,  and  reduceflag  is  true.  Then  for  all  backedge  targets 
X,  the  root  r  is  not  in  the  reachunder  set  of  x.   That  is. 


(P2)   A  //nodeset  >=  k  >=  y  |  r  ~in  reachunder(g,  t, 

nodevect (k) ) 

is  a  loop  invariant.   Then  at  loop  termination,  the  assertion 


(2)  A  y  >  k  >=  1  I  "target-back-edge (nodevect (k) ,  g,  t) 
is  available  .   From  (P2)  and  (2),  it  follows  that 


(3)  A  //nodeset  >=  k  >=  1  |   r  ~in  reachunder(g,  t, 

nodevect (k) ) 

From  (3),  we  can  conclude  that  g  is  reducible.    To   prove   this, 
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however,  we  must  first  show  that  (3)  implies; 


(4)  A  [t,  n]  In  depth- first-trees (g,  r),  x  in  nodes(g) 

r  ~in  reachunder (g,  t,  x) 

It  will  then  follow  from  our  lemmas  that: 


reducible(g,  r) . 
Next  suppose  that  reduceflag  is  false.   Then 

r  in  reachunder (g,  t,  x) 
for  some  x  in  nodeset.   Therefore, 

~reduclble(g,  r). 
Ue  must  now  show  that  (P2)  is  loop  invariant.   The  while  loop   at 
@4  computes  a  set  reach  such  that 

(5)    r  in  reach  <->  r  in  reachunder(g,  t,  x) 
However,  a  complication  arises   here   because   of  the   calls   to 
findiim  and  balance  within  the  loop.   Reach  is  the  reachunder  set 
of  x  in  a  graph  consisting  of  limit  nodes  of  a  map  f  (A  node  z  is 
a  limit  node  if  z  ~in  dom  f ) .   Therefore,  the  assertion 

reach  =  reachunder (g,  t,  x)  +  {x} 
is  not  available.  The  graph  of  limit  nodes  is  in  fact  a 
collapsed  graph,  and  so  to  prove  (5),  we  are  forced  to  introduce 
the  notion  of  a  collapsed  graph.  That  is,  we  add  the  invariant 
that  there  exists  variables  current-g,  current-t,  current-n  such 
that 


(P3)    haspath(derivable(g) ,  g,  current-g)  & 

[current-t,    current-n]    in   depth-first-trees (current-g,    r) 

Then   after   the  while    loop,    we   prove    (3)    from   the  assertion 
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(4)  reach  =  reachunder(x,  current-g,  current-t). 
Let  us  next  consider  the  function  calls  to  balance  and  findlim. 
In  order  to  use  the  output  assertions  of  these  functions,  we  must 
show  that  the  input  assumptions  are  satisfied.  We  therefore 
include  as  part  of  the  loop  invariant  the  statement  that  there 
exists  an  f  such  that: 


(P4)    f  :  smap(nodes (g) )  nodes(g)  &  nocycles(f)  & 

A  z  in  nodes (current-g)  |  z  ~in  dom  f  & 
A  z  in  nodes(g)  |  f-lim(z)  =  froot (fcomp-lim(2) )   & 
z  ~in  dom  fcorap   -> 
count(z)  =  #{y  in  nodes(g)  |  fcomp2-lira(y )  =  z} 


From  tne  output  assertions  of  the  function   calls,   we   can  show 
that  (P4)  is  invariant. 

Finally,  we  must  establish  a  relationship  between  the 
implicit  graph  current-g  and  the  program  variables.  This 
relationship  is  obtained  from  Lemma  8,  Sec.   5.1.7. 

(P5)   current-g  =  {[f-lim(u),  v]  :  [u,  v]  in  g  |  v  ~in  dom  f 

&  f-lim(u)  ~=  v} 

We   combine   these  assumptions   to   obtain   the   following   loop 
invariant  LOOPINVl. 

PI  &  P2   & 

E  curre.it-g,  current-t,  current-n,  f  |   P3  &  P4  &  P5 

Next  we  specify  the  invariant  for  the   forall-loop   at   (?2 .    The 

forall-loop   at   (?2  initializes  the  set  new  so  that  at  the  end  of 

the  loop,  the  assertion: 


(6)        new   =   {z    |     [z,    x]    in   current-g    &  backedge([z,    x] , 

current-t) } 
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is  available.  We  introduce  the  shadow  variable  TEMPI  to  store 
the  values  of  backinv{x}  processed  at  successive  loop  iterations. 
-The  loop  invariant  L00PINV2  is  the  following  formula: 

PI  &  P2   & 

E  current-g,  current-t,  current-n,  f  | 

P3  &  P4  &  P5  &  new  =  {f-lim(z)  |  z  in  TB4P1}. 

Next  we  specify  the  invariant  at  the  while  loop  at  @4.   This  loop 

computes   the   reachunder   set   of   the  backedge   target   x  by  a 

transitive  closure  procedure.   We  therefore  include  the  following 

formula  as  part  of  the  invariant: 

(P6)   reachunder (g,  t,  x)  +  {x}  =   reach  + 

{z  ;  V  in  new  |  E  p  in  paths (current-g-inv,  v,  z)  | 
X  "in  range  p} 

We  next  note  that  the  call   to   balance   implicitly   changes   the 

underlying   map   f,   so   that  (P5)  is  not  loop  invariant  (or  more 

precisely,  (PA)   and   (P5)   are   not   together   loop   invariant). 

Therefore,   we   add   the  conjunct  that  there  exists  a  map  f  such 

that: 


(P7)      f   =    f    +   { [w,    x]    :    w  in   reach-{x}}      & 

current-g   =   {[f'-lira(u),    v]     :     [u,    v]    in  g    | 

V    "in   dora    f    &      f'-lim(u)    ~=  v} 

Then  L00PINV3  is  : 


PI   &  P2  Si      E  current-g,  current-t,  current-n,  f,  f  | 
P3  &  P4  &   P6   &  P7 

Finally,  we  specify  the  invariant   for   the   forall-loop   at   @6. 

This  loop  adds  the  set 

current-g-inv{w}  -  reach 

to  the  set  new.   We  introduce  the  shadow  variable  TEMP2  to  store 
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the  values   of  g-inv{w}  used  during  successive  iterations  of  the 
forall-loop.   L00PINV4  is  the  formula: 

PI  &  P2   &  E  current-g,  current-t,  current-n,  f,  f  | 
P3  &  P4  &  P6  &  P7  & 
newl  =  new  +  {head-lim{z}  :  z  in  TEMP2 } 

This  completes  the  annotation  of  the  praa  (1).   Next,  we   specify 

the  verification   conditions  which  must  be  validated  in  order  to 

prove  (1). 


(VCl)  flowgraph(g,  r)  & 
r  ~in  range  g  & 
(A  e  in  g  I  e(l)  ~=  e(2)) 

-> 
flowgraph(g,  r) 

[This  verifies  the  input  assumption  of  dfst  at  QO] 


(VC2)    flowgraph(g,  r)   & 
r  ~in  range  g   & 
(A  e  in  g  I  e(l)  ~=  e(2))   & 
[t,  n]  in  depth-first-trees (g,  r)   & 
ndescs  =  {[z,  numdescs(t,  z)]  :  z  in  nodes (t)}   & 
nodeset  =  nodes (g)   & 
nodevect  =  { [v,  u]  :  [u,  v]  in  n}   & 
backinv  =  { [v,  u]  :  [u,  v]  in  g  |  n(v)  <=  n(u)   & 

n(u)  <  n(v)  +  ndescs(v)}  & 
targbackedges  =  dora  backinv   & 
f  coinp  =  {  }   & 

front  =  {[z,  z]  :  z  in  nodeset}   & 
count  =  {[z,  1]  :  z  in  nodeset} 

-> 
LOOPINVl 

[This    formula   states    that   LOOPINVl    is   satisfied   when  @1    is 
entered.] 


(VC3)     LOOPINVl   & 

"(  E  y  >  k  >=  1  I  nodevect (k)  in  targbackedges)  & 
(reduce  flag  <->  true) 

-> 
(reduceflag  <->  reducible(g,  r)) 

[This  states  that  the  output  assertion  is  satisfied  when  the 
loop  terminates  normally.] 
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(VGA)    LOOPINVl   & 

yl  =  max  /  {y  >  k  >=  1  |  nodevect(k)  in  targbackedges} 
&  X  =  nodevect(yl)   & 
new  =  { }   & 
TEMPI  =  {} 
-> 
L00PINV2 

[This  states  that  LOOPINVZ  is  available  when  the  loop  at  &2    is 
entered. ] 

(VC5)    L00PINV2  & 

V  in   backinv{x}   -  TEMPI 

-> 
A   z    in   nodes(g)     |    f-lim(z)    =    froot (fcomp-lim(z) )      & 

V   in   nodes(g) 

[This    states    that    the    input    assumption   of    findlim  is   satisfied.] 

(VC6)    L00PINV2   & 

V  in  backinv{x}  -  TEMPI   & 

(A  z  in  nodes(g)  |  f-lim(z)  =  froot ( f-comp-lim(z) ) )  & 

h  =  f-lim(v)   & 
newl  =  new  +  {h}   & 
TEMPI!  =  TEMPI  +  {v} 

-> 
L00PIMV2(new  \  newl,  TEMPI  \  TEMPI  1) 

[This  states  that  L00PINV2  is  loop  invariant.] 

(VC7)    L00PINV2   & 

TEMxPl  =  backinv{x}   & 
reach  =  {x} 

-> 
LOOP I NV 3 

[This  states  that  L00P1NV3  is  satisfied  on  entry  to  the  loop  at 
@4.] 

(VC8)    L00PINV3   & 
new  *"=  {  }   & 
w  in  new   & 
newl  =  new  -  {w}   & 
reachl  =  reach  +  {w} 

-> 
w  "in  dora  f  &  x  ~in  dom  f   & 

(A    z    in    nodes(g)     |     f-lira(z)    =    froot ( fcomp(z) )      & 
z    ~in    dora    fcomp   -> 

count(z)    =    //{u    in    nodes(g)     |     fcorap-lim(u)    =  z}) 

[This  states  that  the  input  assumption  of  balance  is  satisfied  at 
@5.  ] 

(VC9)   LOOP I NV 3   & 
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new  "=  { }   & 

w  in  new   & 

newl  =  new  -  {w}  & 

reachl  =  reach  +  {w} 

(A  z  in  nodes (g)  |  froot ' ( fcomp' (w) )  =  x   & 

fcomp-liin(z)  =  fcomp ' -limC z  )   & 

z   ~in   dom    fcomp'        -> 
count' (z)    =    //{y    in   nodes(g)     [     fcorap'-lim(y )    =   z}) 

-> 
L00PINV4 

(This  states  that  L00PINV4  is  satisfied  on  entry  to  the  loop  at 

@6] 

(VCIO)   L00PINV4  & 

u  in  g-inv{w}  -  TEMP2 

-> 
(A  z  in  nodes(g)  |  f-lin(z)  =  froot ( fcorap-lim(z) ) )   & 
u  in  nodes (g) 

[This  states  that  the  input  assumption  of  findlim  is 
satisfied  at  @7] 

(veil)   L00PINV4   & 

(A  z  in  nodes(g)  |  f-lim(z)  =  froot ( fcomp-lira(z) ) )  & 
h  =  f-lim(u)   & 
h  in  reach   & 
TEMP21  =  TEMP2  +  {u} 

-> 
LOOPINV4(TEMP2  \  TEMP21) 


[This  states  that  L00PINV4  is  invariant  when  the  if  statement 
is   false.] 

(VCi2)   L00PIMV4   & 

(A  z  in  nodes(g)  |  f-lim(z)  =  froot ( fcomp-lim(z) ) )   & 
h  =  f-lim(u)   & 
h  ~in  reach   & 
newl  =  new  +  {h}   & 
TEMP21  =  TEMP2  +  {u} 

-> 
L00P1NV4(TEMP2  \  TEMP21   &   new  \  newl) 

[This  states  that  LO0PINV4  is  invariant  when  the  if  statement  is 
true. ] 

(VCIB)   L00PINV4   & 

TEMP2  =  gO-inv{w} 

-> 
L00P1NV3 

[This  states  that  L00PINV3  is  invariant. 
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(VC14)   L00PINV3  & 
new  =  { }   & 
r  in  reach   & 
(reduceflag  <->  false) 

-> 
(reduceflag  <->  reducible(g,  r)) 

[This  states  that  the  output  assertion  is  satisfied  when  false 
is  returned.] 

(VC15)   L00PINV3   & 
new  =  { }   & 
r  ~in  reach   & 
yl  =  y  +  1 

-> 
LOOPINVKy  \  yl) 

[This  states  that  LOOPINVl  is  loop  invariant.] 


The  lemmas  given  in  section  5.2  can  now  be  used  to  assist  in 
the  proof  of  these  verification  conditions.  We  have  shown  that 
all  of  the  important  data  objects  which  were  introduced  and 
eliminated  in  the  course  of  the  algorithm  derivation  in  section 
5.1  must  be  introduced  here  to  prove  the  low  level  version. 
Moreover,  the  fact  that  we  have  already  derived  the  algorithm  has 
simplified  this  exercise;  proving  the  algorithm  (1)  without  an 
understanding  of  its  evolution  would  be  impossible,  and  proving 
it  in  any  case  is  still  quite  hard,  as  can  be  seen  from  the 
verification  conditions.  The  step-wise  transformational  approach 
provides  a  means  of  organizing  and  developing  the  proof  in  a 
gradual  manner. 


CHAPTER  6 
Conclusion 


As  the  case  study  in  Chapter  5  demonstrates,  the  formal 
verification  of  logically  complex  algorithms  poses  a  considerable 
challenge-  While  the  final  algorithm  is  quite  short,  it  is  of  a 
subtlety  far  exceeding  the  domain  of  comfortable  applicablility  of 
existing  verifiers.  Certainly,  the  majority  of  programs  written 
are  not  as  difficult  to  verify;  however,  it  is  precisely  the 
verification  of  those  algorithms  which  are  most  difficult  to 
understand  that  is  most  important.  In  coming  to  terms  with  these 
more  complex  proofs,  we  make  the  following  observation:  Much  work 
remains  to  be  done  in  designing  techniques  which  minimize  the 
repetitive  "non-creative"  aspects  of  automatic  verification;  it 
is  quite  important  to  free  the  user  of  an  interactive  verification 
system  from  tedious  low-level  details,  and  any  verification  system 
which  does  not,  will  not  be  practical  for  the  verification  of  more 
than  very  simple  programs. 

To  this  end,  we  have  specified  a  powerful  programming 
language  (as  well  as  logical  language)  based  on  set  theory, 
thereby   minimizing   the   distinction   between   a   high   level 
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formulation  of  an  algorithm  and  its  mathematical  statement  and 
simplifying  the  initial  proof.  As  the  nodal  spans  example  in 
section  A. 2. 2  demonstrates,  a  high  level  formulation  consisting  of 
a  single  statement  can  be  directly  transformed  into  a  more 
conventional  program  by  applying  standard  compiler-type 
transformations.  We  have  defined  a  set  of  transformations  which 
enable  the  combination  of  verified  program  elements;  by  our  reuse 
of  several  general  praa  fragments,  in  particular  the  transitive 
closure  praa  and  the  compressed  balanced  tree  representation  for 
the  union-find  problem,  we  have  shown  that  this  combinatorial 
ability  is  indispensable.  Similarly,  a  high  level  specification 
from  which  implementation  details  are  deliberately  omitted  not 
only  facilitates  verification,  but  also  can  serve  as  a  root  for 
the  derivation  of  several  algorithm  variants,  e.g.  the  search 
algorithms  in  Chapters  1.5  and  the  minimum  cost  spanning  tree 
example  in  Chapter  4.2.1. 

Even  though  a  particularly  clever  high  level  algorithm 
formulation  and  the  use  of  program  transformations  can  alleviate 
the  difficulty  of  program  verification  (the  nodal  spans  algorithm, 
for  example,  was  derived  with  very  little  actual  theorem  proving), 
we  feel  that  most  of  the  difficulty  still  lies  in  proving  the 
verification  conditions.  We  believe  that  the  burden  of  formal 
proof  of  verification  conditions  can  also  be  alleviated  by  the  use 
of  a  powerful  logical  theory,  in  particular  set  theory,  and  by 
automating  as  many  routine  (or  almost  routine)  steps  as  possible. 
This   includes   continued   research  in  the  development  of  decision 
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algorithms  for  ground  formulae  of  various  subtheories,  as  well  as 
decision  techniques  for  handling  simple  instantiations.  These 
issues  are  currently  being  investigated  by  Schwartz,  Ferro, 
Breban,  and  Omodeo  [FOS79] ,  [Schw79b] .  The  problem  of  general 
set-theory  instantiation  (see  Schwartz  [Schw78] )  is  quite 
difficult;  however  a  proof  checker  system  in  which  the  user  only 
has  to  supply  non-trivial  instantiations  is  itself  an  ambitious 
undertaking,  and  will  he  a  quite  useful  tool. 

Building  a  system  which  realizes  the 
transformation/verification  rules  specified  in  this  thesis 
requires  the  integration  of  various  standard,  albeit  substantial 
software  components.   In  particular,  a  system  would  include: 

(a)  A  compiler  front  end  which  transforms  the  input  text  into 
a  parse  tree,  in  a  form  which  facilitates  program  transformations. 
An  unparser,  which  reconstructs  program  text  from  a  parse  tree  is 
also  aecessary. 

(b)  Pattern  matching  routines  for  performing  transformations 
on  a  parse  tree. 

(c)  A  program  analyser.  Since  many  transformation  rules 
require  live/dead  information,  a  system  which  implements  our  rules 
must  have  a  global  analyzer  which  is  capable  of  constructing 
use-def  chaining  information  and  control  flow  information,  l^rtiile 
nothing  more  elaborate  is  required  in  an  initial  implementation,  a 
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more  sophisticated  analyser  might  be  used  for  for  the  automatic 
verification  of  simple  assumptions.  For  example,  global  type 
information  can  be  used  to  verify  assumptions  such  as: 

1=  f  :  map. 
The  type  of  domain/range  information  obtained  for   automatic   data 
structure   choice  (SSS79]  can  be  used  to  verify  assumptions  of  the 
form: 

1=  f:  map(sl)  s2 
These  techniques  are  efficient  but  not  complete. 

(d)  A  verification  condition  generator,  as  described  in 
Chapter  3.2,  with  which  a  user  can  selectively  generate  a 
verification  condition  for  an  assumption  at  a  particular  program 
place.   This  particular  component  is  quite  straightforward. 

(e)  A  proof  checker.  The  proof  checker  will  be  the  most 
substantial  and  critical  component;  the  convenience  of  proof 
checking  will  very  much  affect  the  usefulness  of  the  system.  This 
component  will  itself  consist  of  many  subcomponents,  including 
formula  manipulation  routines  which  implement  the  natural 
deduction  framework,  various  decision  algorithms,  reduction  rules, 
a  method  for  handling  equality,  a  resolution  algorithm,  etc. 

(f)  An  editor  to  facilitate  the  entering  of  new  program  texts 
into  the  system. 
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(g)  A  praa  library.  The  system  will  maintain  a  growing 
collection  of  verified  praas  which  can  be  used  in  various  contexts 
by  application  of  the  input  variable  substitution  rule.  A 
comprehensive  collection  of  useful  root  praas  and  data  structure 
representations  is  yet  to  be  gathered. 

(h)  A  transformation  library.  The  system  will  also  maintain 
a  growing  collection  of  correctness  preserving  transformations.  A 
transformation  may  be  stored  as  a  pair  of  parse  trees  Tl  and  T2 
(see  Paige  [Pa79] ) .  A  general  pattern  matching  routine  then  can 
match  the  pattern  tree  Tl  against  an  actual  praa  parse  tree.  When 
a  match  is  found,  an  expander  routine  replaces  the  matched  portion 
of  the  praa  tree  by  T2 .  We  should  be  able  to  add  new  derived 
transformation  rules  to  the  library,  where  a  derived  rule  is  a 
sequence  of  existing  rules.  Suppose  we  wish  to  form  a  derived 
rule  Tl'  =>  12',  which  can  be  composed  from  rules  Rl,...,Rk.  We 
can  either  (i)  implement  it  as  a  sequence  of  applications  of 
existing  rules,  that  is,  implement  the  new  rule  as  a  sequence  of 
calls  to  Rl,...,Rk,  or  (ii)  perform  the  sequence  of 
transformations  Rl , . . . ,Rk  directly  on  the  transformation  tree  Tl' 
to  obtain  T2'  as  the  result.  This  latter  solution  is  more 
efficient,  making  derived  transformations  appear  as  primitive 
transformations,  and  is  an  interesting  area  of  further  research. 

(i)  A  theorem  library. 

In  maintaining  large  libraries  where  the   user  must   name  a 
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particular  rule  or  library  entry,  a  tradeoff  must  be  made. 
Namely,  it  becomes  increasingly  difficult  for  the  user  to  know 
exactly  what  the  libraries  contain.  Techniques  for  structuring 
and  cataloguing,  as  well  as  ways  of  determining  applicable  rules, 
will  be  indispensable. 

(j)  Finally,  specific  transformation  techniques  which  have 
been  developed,  such  as  recursion  removal  and  formal 
differentiation,  should  eventually  be  incorporated  into  the 
system.  Formal  differentiation  has  proved  to  be  a  widely 
applicable  paradigm.  Implementing  it  and  extending  it  to  other 
language  dictions  and  data  structures  would  greatly  enhance  our 
transformational  capabilities. 

Further  extensions  include  features  for  handling  a  broader 
range  of  language  dictions.  For  example,  the  language  could  be 
made  more  high  level  by  adding  backtracking  primitives  [Sh79a] . 
Bauer  (Ba78]  suggests  the  use  of  a  wide  spectrum  language  which 
contains  very  low  level,  Fortran-type  dictions,  as  well  as  high 
level  set  theoretic  dictions.  Schwartz  {DS77]  has  also  discussed 
the  possibility  of  using  transformation  techniques  to  obtain 
assembly  language  programs  from  high  level  specifications. 

We  have  concentrated  on  the  verification  of  relatively  short, 
complex  algorithms,  and  therefore  have  not  explored  the 
verification  of  large  programs,  which  present  a  somewhat  different 
set   of   problems.    For   example,   the  verification  of  a  compiler 
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presents  the  problem  of  specifying  what  a  compiler  should  actually 
do;  a  compiler  itself  is  its  own  definition.  The  area  of 
parallel  program  verification  also  presents  challenging  problems. 

Beyond  the  automation  of  the  more  or  less  obvious  programming 
techniques  with  which  this  thesis  deals,  lie  the  interesting  and 
more  difficult  questions  of  uncovering  rules  for  true  algorithm 
invention.  For  example,  Tarjan's  insight  in  discovering  his  fast 
interval  finding  algorithm  was  realizing  that  it  could  be 
formulated  as  a  union- find  problem.  Our  transformational 
derivation  in  section  5.1  from  a  high  level  specification  into  his 
algorithm  involved  careful  user  guidance  to  expose  the  union-find 
operations.  Once  exposed,  it  is  relatively  quite  easy  to 
automatically  obtain  the  efficient  implementation.  Such  questions 
of  algorithm  synthesis  are  on  the  boundary  of  what  can  be  Imagined 
as  implementable.  (Sharir  [Sh79a]  has  begun  to  discuss  a 
particular  rule  for  algorithm  synthesis.)  Whether  or  not  such 
subtle  algorithms  as  Tarjan's  graph  algorithms  could  ever  be 
automatically  invented,  we  believe  that  the  discovery  of  those 
rules  which  describe  algorithm  technique  will  enhance  the 
production  of  reliable  programs. 
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